Re: MARC structure (Was: Re: Ceci n'

From: Weinheimer Jim <j.weinheimer_at_nyob> Date: Mon, 27 Aug 2007 10:24:54 +0200 To: NGC4LIB_at_listserv.nd.edu

We shouldn't compare one formatted version with an unformatted version. A real MARC format record is totally unintelligible. Here is the MARC record for "The Old Man and the Sea" (maybe some special characters won't make it)
00857cam a2200277   4500001000800000005001700008008004100025035002100066906004500087010001700132040003000149043001200179050003100191082001300222100003500235245002900270260005100299300002600350650002200376650002400398650003000422651001900452655002700471655002300498991005800521413818720040414180304.0690718s1963    rurc          000 1 eng    9(DLC)   77004883  a7bcbccorignewduencipf19gy-gencatlg  a   77004883   aDLCcDLCdDLCdOCoLCdDLC  anwcu---00aPZ3.H3736bOl5aPS3515.E3700a813/.5/21 aHemingway, Ernest,d1899-1961.14aThe old man and the sea.  a[Moscow,bForeign Languages Pub. House,c1963]  a109 p.bport.c20 cm. 0aFishersvFiction. 0aOlder menvFiction. 0aMale friendshipvFiction. 0aCubavFiction. 7aBildungsromane.2gsafd 7aAllegories.2gsafd  bc-GenCollhPZ3.H3736iOl5p00017613236tCopy 1wBOOKS

The advantage of the MARC format above is that it takes less computer memory, but this is not much of a consideration anymore. The advantage to XML is not the XML per se, it's that it can work with XSLT (style sheets) and from that can be turned into anything you want. It could be a Microsoft document or pdf, or UNIMARC or whatever. You could probably even turn it into an image.

Also, with programs such as Lucene and Zebra, we may not even need databases anymore, since the XML records can be indexed and searched very quickly and powerfully. For example, take a look at the AGRIS database that I worked on at http://www.fao.org/agris/Centre.asp?Menu_1ID=DB&Menu_2ID=DB1&Language=EN&Content=http://www.fao.org/agris/search?Language=EN

There are millions of records here, and it is not in a database. Notice the fast response time!

And the greatest thing is: it's all open-source! The new implementations of Koha use Zebra, which I am trying to implement now.

Jim Weinheimer

> Alexander Johannesen wrote:
> >
> >
> > Hmm. What about XML as a standard is not elegant?
> Indeed it isn't. A format that gobbles up more bytes for tagging than
> the data it wraps cannot be elegant. Even gift wrappings nowadays have
> to be ecologically sound.
>
> >
> > What, exactly, is the difference between ;
> >
> > <record>
> >     <datafield tag="245"
> ind1="1" ind2="0">
> >       <subfield
> code="a">[Interview with Keith McCance&#93;</subfield>
> >       <subfield
> code="h">[sound recording&#93; /</subfield>
> >       <subfield
> code="c">[Interviewer : Bronwyn Benn&#93;.</subfield>
> >     </datafield>
> > </record>
> >
> You don't see what I mean? 31 extra bytes for every subfield rather than
> 2? Where is this more elegant than
>
> 245 10 $aInterview with Keith McCance$h[sound recording]$c[Interviewer :
> Bronwyn Benn]   ?
>
> > and ;
> >
> > <record>
> >     <title>
> >       <main>Interview with Keith
> McCance</main>
> >       <media>sound
> recording</media>
> >       <responsible>Interviewer :
> Bronwyn Benn</responsible>
> >     </title>
> > </record>
> >
> The difference is the language-tied tags. They are not international.
> Only numbers are. Terminology chances, and then there you are with
> your nice outdated tags. Numbers resist change.
>
> B.Eversberg