Re: MARC vs XMLMARC

From: Stephens Owen <owen.stephens_at_nyob>
Date: Sat, 26 May 2007 15:26:59 +0100
To: NGC4LIB_at_listserv.nd.edu
> Suzanne wrote:
> 1) The MARC validates along MARC rules - subfields that are not valid in a
> tag will not work in proper software that knows MARC. Required pairing of
> tags, etc.  MARCXML current schema or dtd or whatever it is, does NOT have
> built into it this validation. You can put "silly" subfields etc in tags. It
> doesn't quite know when you are kidding around.  I bet that can be fixed
> with proper programing and the schema/dtd/whatever it is written "tighter".
> Of course that makes one ask "Do we want that ridge of a system?"  There can
> be a pretty clear argument that it works better when these rules are really
> followed.
> 
>>> >> But I donąt think validation is built into the MARC record. If you coded
>>> a record in MARC format by hand, you could code all kinds of nonsense in. It
>>> is the applications we use that validate. In some cases this can be tweaked
>>> ­ for example the Ex Libris product Aleph allows you to decide what level of
>>> error different abuses of the MARC record are ­ some generate warnings, some
>>> stop you saving records. Using XML you could look at developing a schema ­
>>> but you could also avoid this ridgidity by keeping the rules in the
>>> applications you use.
> 
> -----
> Suzanne wrote:
> 2) "Title" is English.  "245" is more internationally recognized.  I can
> mark up things in MARC and it can be understood by a fellow MARC literate
> person whose language is different. And I can tell what is being described
> by someone else even if I can not speak their language. I've looked at
> records where I do not know the meaning of the words, but I know alot about
> what it is that the person was describing by the tagging.  That helps me
> alot.
> 
> I'm not saying that we don't need to improve - but I think we need to think
> about what some of our trade offs are when we move into the future. And make
> sure we understand them clearly.
> 
>>> >>If you look at the XML Frances supplies you can see it preserves the use
>>> of the MARC tags. Some might argue that we would be better off ditching
>>> these tags for human readable ­ but for me this is a bit of a red herring ­
>>> I donąt really care if the tags are coded or not.
> 
> The key thing about the XML expression of the MARC record, as I think Frances
> is getting at, is that we would be buying into a community that is used to
> dealing with this. For example, the XML MARC record could be opened in a
> browser, Word, Excel etc. and they would make somekind of sense of it. Also,
> transforming it into another format is straightforward with XML Stylesheets
> (XSLT), and it is a relatively trivial task to transform an XML MARC record
> into (for example) several html based display versions (one for screen, one
> for printing etc.). And any programmer familiar with manipulating XML would be
> able to do it. The point has previously been made on this list that both
> traditional MARC and XML MARC are meant to be machine readable ­ so who cares,
> you write a program to parse the data, and then you do what you want in your
> application. This may be true ­ but it is the richness of the community out
> there working with XML that makes this attractive to me.
> 
> The point is that by needlessly (imo) sticking to tradiational MARC as our
> format, we are making a statement (even if not deliberately) that libraries
> donąt want to play with others nicely. We could change this, and we might
> start to see some interesting applications appearing from outside (as well as
> inside) the library community.
> 
> There might be an argument for going further than this, and the ŒMurdering
> Marcą 
> (http://www.arcknowledge.com/cgi-bin/namazu.cgi?query=murdering+marc&submit=Se
> arch%21&idxname=culture.libraries.ngc4lib&max=20&result=normal&sort=score)
> debate is clearly ongoing, but I believe that moving to XML expression of MARC
> would be a big step forward in terms of working with other communities.
> 
> Owen
> 
> On 5/25/07, Frances McNamara <f-mcnamara_at_uchicago.edu> wrote:
>> >
>> > In the discussion of MARC vs. RDF it seemed to me that people think of
>> > MARC as what they see on the screen of OCLC or your ILS.  But actually
>> > MARC looks like this:
>> >
>> > 01252nam  2200265K
>> > 
>> 45000010008000000050017000080070014000250070014000390070014000530080041000670
>> 35001200108035002000120040002000140049000900160100002200169245033700191260006
>> 20052830000520059053301480064261000340079083000600082494900300088490300090091
>> 4985006300923
>> > 4270827 20030530021400.0 hdrafa---baca hdrbfa---baaa hdrbfa---baba
>> > 000601r19221622xx
>> > h    a     000 0 eng d    a4270827    a(OCoLC)44163773
>> > aCGU cCGU dOCoLC    aCGUA 1  aRobinson, Thomas. 10 aAnatomie of the
>> > English nvnnery at Lisbon in Portvgall h[microform]. bDissected and laid
>> > open by one that was sometime a yonger brother of the couent: Who (if
>> > the grace of God had not preuented him) might have growne as old in a
>> > wicked life as the oldest amongst them. cPub. by authoritie. (By Thomas
>> > Robinson of King's Lynn). 1622.    aKing's Lynn : bThew and son, pub. by
>> > E.M. Beloe, c[1922?]    a2 p. l., bfacsim. (4 p. l., 32 p.) 1
>> > l. c28 cm.    aMicrofilm. bChicago : cUniversity of Chicago
>> > Library, d2000. e1 microfilm reel ; 35 mm. f(History of religions
>> > preservation project, MN41672.1) 20 aSion House (Lisbon, Portugal)
>> > 0 aHistory of religions preservation project ; vMN41672.1.
>> > aJRL cMicSLN e41672.1 gc.1    aMARS
>> > a microfm 41672.1 bc.1 cUCSLN dJRL eMic rmq7090430 sn tstks
>> >
>> >
>> > An application has to know the secret of how to interpret this to give
>> > you the display you are used to seeing.  Needless to say, MicroSoft and
>> > other big commercial vendors wouldn't bother to be able to interpret
>> > this.  So you have to have something like the "MARC Breaker" of the
>> > little free MARC Edit program to convert this to something you might
>> > use.  MARC in and of itself can be a barrier to being able to use
>> > records with other computer programs.
>> >
>> > XML, though, has rules that will let lots of software written for lots
>> > of things be able to handle it.  So whether you are using "245" or
>> > "title" to identify the title part, it really is easier for a
>> > nonlibrary-specific program to find and use the information.  The XML
>> > version below is more verbose but storage and processor speeds are now
>> > much less expensive than they were when MARC was developed.  If you
>> > limit yourself to the MARC format you limit what programs you can use to
>> > manipulate the data.  I'm sure that's not the only aspect of this that
>> > people were talking about but it is useful to remember that that is what
>> > MARC really looks like.  I'll put the XML below for comparison, or you
>> > can ignore it.  I think with RDF it may be better in the future to use a
>> > word like "title" rather than "245" because software in the "semantic
>> > web" will be more likely to be able to read real words and make
>> > inferences from them.  At least I think that is the intention.
>> >
>> > Frances McNamara
>> > University of Chicago Library
>> >
>> >
>> > ?xml version="1.0" encoding="UTF-8"?><collection
>> > xmlns="http://www.loc.gov/MARC21/slim">
>> > <record>
>> >    <leader>01252nam a2200265K  4500</leader>
>> >    <controlfield tag="005">20030530021400.0</controlfield>
>> >    <controlfield tag="007">hdrafa---baca</controlfield>
>> >    <controlfield tag="007">hdrbfa---baaa</controlfield>
>> >    <controlfield tag="007">hdrbfa---baba</controlfield>
>> >    <controlfield tag="008">000601r19221622xx h    a     000 0 eng
>> > d</controlfield>
>> >    <datafield tag="035" ind1=" " ind2=" ">
>> >      <subfield code="a">4270827</subfield>
>> >    </datafield>
>> >    <datafield tag="035" ind1=" " ind2=" ">
>> >      <subfield code="a">4270827</subfield>
>> >    </datafield>
>> >    <datafield tag="035" ind1=" " ind2=" ">
>> >      <subfield code="a">(OCoLC)44163773</subfield>
>> >    </datafield>
>> >    <datafield tag="040" ind1=" " ind2=" ">
>> >      <subfield code="a">CGU</subfield>
>> >      <subfield code="c">CGU</subfield>
>> >      <subfield code="d">OCoLC</subfield>
>> >    </datafield>
>> >    <datafield tag="049" ind1=" " ind2=" ">
>> >      <subfield code="a">CGUA</subfield>
>> >    </datafield>
>> >    <datafield tag="100" ind1="1" ind2=" ">
>> >      <subfield code="a">Robinson, Thomas.</subfield>
>> >    </datafield>
>> >    <datafield tag="245" ind1="1" ind2="0">
>> >      <subfield code="a">Anatomie of the English nvnnery at Lisbon in
>> > Portvgall</subfield>
>> >      <subfield code="h">[microform].</subfield>
>> >      <subfield code="b">Dissected and laid open by one that was
>> > sometime a yonger brother of the couent: Who (if the grace of God had
>> > not preuented him) might have growne as old in a wicked life as the
>> > oldest amongst them.</subfield>
>> >      <subfield code="c">Pub. by authoritie. (By Thomas Robinson of
>> > King's Lynn). 1622.</subfield>
>> >    </datafield>
>> >    <datafield tag="260" ind1=" " ind2=" ">
>> >      <subfield code="a">King's Lynn :</subfield>
>> >      <subfield code="b">Thew and son, pub. by E.M. Beloe,</subfield>
>> >      <subfield code="c">[1922?]</subfield>
>> >    </datafield>
>> >    <datafield tag="300" ind1=" " ind2=" ">
>> >      <subfield code="a">2 p. l.,</subfield>
>> >      <subfield code="b">facsim. (4 p. l., 32 p.) 1 l.</subfield>
>> >      <subfield code="c">28 cm.</subfield>
>> >    </datafield>
>> >    <datafield tag="533" ind1=" " ind2=" ">
>> >      <subfield code="a">Microfilm.</subfield>
>> >      <subfield code="b">Chicago :</subfield>
>> >      <subfield code="c">University of Chicago Library,</subfield>
>> >      <subfield code="d">2000.</subfield>
>> >      <subfield code="e">1 microfilm reel ; 35 mm.</subfield>
>> >      <subfield code="f">(History of religions preservation project,
>> > MN41672.1)</subfield>
>> >    </datafield>
>> >    <datafield tag="610" ind1="2" ind2="0">
>> >      <subfield code="a">Sion House (Lisbon, Portugal)</subfield>
>> >    </datafield>
>> >    <datafield tag="830" ind1=" " ind2="0">
>> >      <subfield code="a">History of religions preservation
>> > project;</subfield>
>> >      <subfield code="v">MN41672.1.</subfield>
>> >    </datafield>
>> >    <datafield tag="949" ind1=" " ind2=" ">
>> >      <subfield code="a">JRL</subfield>
>> >      <subfield code="c">MicSLN</subfield>
>> >      <subfield code="e">41672.1</subfield>
>> >      <subfield code="g">c.1</subfield>
>> >    </datafield>
>> >    <datafield tag="903" ind1=" " ind2=" ">
>> >      <subfield code="a">MARS</subfield>
>> >    </datafield>
>> >    <datafield tag="985" ind1=" " ind2=" ">
>> >      <subfield code="a">&#24;microfm&#25;41672.1</subfield>
>> >      <subfield code="b">c.1</subfield>
>> >      <subfield code="c">UCSLN</subfield>
>> >      <subfield code="d">JRL</subfield>
>> >      <subfield code="e">Mic</subfield>
>> >      <subfield code="r">mq7090430</subfield>
>> >      <subfield code="s">n</subfield>
>> >      <subfield code="t">stks</subfield>
>> >    </datafield>
>> > </record>
>> >
> 


Owen Stephens
E-Strategy Co-ordinator
Royal Holloway, University of London
Egham
Surrey
TW20 0EX
Tel: 01784 443331
Email: owen.stephens_at_rhul.ac.uk
Received on Sat May 26 2007 - 08:24:51 EDT