Re: MARC vs XMLMARC

From: Houghton,Andrew <houghtoa_at_nyob>
Date: Sat, 26 May 2007 20:14:01 -0400
To: NGC4LIB_at_listserv.nd.edu
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Suzanne Pilsk
> Sent: 25 May, 2007 21:54
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] MARC vs XMLMARC
>
> The issues I have always had with this discussion I think can
> be solved by someone who understands this a whole lot better
> than me but
> 1) The MARC validates along MARC rules - subfields that are
> not valid in a tag will not work in proper software that
> knows MARC. Required pairing of tags, etc.  MARCXML current
> schema or dtd or whatever it is, does NOT have built into it
> this validation. You can put "silly" subfields etc in tags.
> It doesn't quite know when you are kidding around.  I bet
> that can be fixed with proper programming and the
> schema/dtd/whatever it is written "tighter".
> Of course that makes one ask "Do we want that ridge of a
> system?"  There can be a pretty clear argument that it works
> better when these rules are really followed.

Having been OCLC's representative to the meeting that formalized
the MARC-XML standard at the Library of Congress, I feel somewhat
compelled to provide some historical context to the statement, or
frustration, you express.

The MARC-XML schema was viewed as a base schema for the purpose
of transporting *any* MARC (ISO 2709) data in *any* format.  The
only assumptions made was that the ISO 2709 records must have
two indicator values, a one character subfield code and a three
character tag.  Which accounted for almost all MARC variants
running around in the wild at the time of the MARC-XML
standardization effort.  This meant, when the MARC-XML standard
was published, you could use it to transport CAN-MARC, UK-MARC,
UNIMARC, etc., as well as MARC-21, in the authorities,
bibliographic, classification, community or holdings formats.

One of the issues discussed at the LC meeting was whether or not
we should "bake" the validation into the schema.  LC's experience
with their MARC SGML DTD, which was created prior to the MARC-XML
standardization effort, gave them insight into "real" world MARC
records running around in the wild.  At the meeting LC described
how they built this beautiful MARC SGML DTD that had validation
for each MARC format and they were shocked when they tried to
validate their own records against their SGML DTD.

Even though MARC records may not pass validation for their
associated format it doesn't mean that you never want to
transport them over the wire in XML.  An example of this
might be a MARC validation Web service.  A library may want
to encode that MARC record in their local system into MARC-XML
and send it to a MARC validation Web service that will send
back an "issues" list associated with that MARC record.

It's a somewhat lame example since if you had full validation in
the schema, then you probably wouldn't be sending the MARC record
to a validation Web service in the first place...  I'm sure
someone else might be able to provide a more meaningful example
where you might want to transport a MARC record even though it
would not pass validation.  I'm drawing a blank right now on
the examples expressed by RLG at the MARC-XML standardization
meeting.

The MARC-21 format standards do change on occasion.  The process
is handled by MARBI.  For example when CAN-MARC and UK-MARC were
integrated into the MARC-21 formats, there were changes and there
will be additional changes when the German MARC standard is
incorporated into MARC-21, as well.  If the MARC-XML standard had
"baked" the validation into the schema, then it would have to
issue a new schema for every MARBI change to the formats.

It's not impossible to change a schema, but put in context MARBI
meets twice a year at ALA.  Worst case, it would have been possible
to have format changes occur twice a year since 2002 when the
MARC-XML standard was published.  So there would have been the
possibility, by now, of 10 different MARC-XML schemas running
around in the wild and anyone working with MARC-XML data would
have to take all 10 of those schemas into account.

It's also important to keep in mind "structure" vs. "content".
The MARC-XML standard provides the necessary schema for
describing MARC's "structure" not its "content".  Just like
ISO 2709 describes MARC's structure and not its content.

One of the issues we discussed at the LC meeting was whether
there was a need to develop additional schemas, one for each
MARC-21 format, that would extend the base MARC-XML schema
with the appropriate validation.  It was decided that there
wasn't a pressing need, at the time, to invest in creating
those additional schemas.  Those format specific schemas
could always be built at a latter date and to my knowledge
no one has ever built them, e.g., OCLC, RLG, LC, local system
vendors, etc.

The MARC-XML schema actually does do some validation on the
contents of the leader, tags, indicators and subfield codes,
but it only places restrictions on their length and the
usable characters to insure compatibility with almost all
MARC standards in the wild.  For example:

1) The leader must be 24 characters long and conform to a
   specific pattern, e.g., record length and base address
   must be in the right places.
2) A control field tag must be three characters long and must
   start with "00" and be followed by a single digit or
   alphabetic character (a-z or A-Z).
3) A data field tag must be three characters long and must
   contain only certain characters in a specific pattern.
4) An indicator value must be one character long and only
   contain a digit, lowercase alphabetic (a-z), or space.
5) A subfield code must be one character long and contain only
   certain characters.

> 2) "Title" is English.  "245" is more internationally
> recognized.  I can mark up things in MARC and it can be
> understood by a fellow MARC literate person whose language is
> different. And I can tell what is being described by someone
> else even if I can not speak their language. I've looked at
> records where I do not know the meaning of the words, but I
> know a lot about what it is that the person was describing by
> the tagging.  That helps me a lot.

Be very careful when using that phrase "internationally
recognized".  Your statement only applies when you talking
about MARC-21!!  In UNIMARC the "245" field isn't the
title, which if memory serves me, the title in UNIMARC is
a 5XX field.


Andy.
Received on Sat May 26 2007 - 18:05:41 EDT