Re: After MARC...MODS?

From: Walker, David <dwalker_at_nyob> Date: Tue, 20 Apr 2010 06:55:50 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

> After struggling for many hours yesterday with MARC::XML 
> and MARC4J (the only tools available to me), I'm still stuck.

MJ, have you tried using yaz-marcdump [1], which comes with the yaz toolkit [2] ?

You can invoke that from the command line to convert your MARC to MARC-XML.  And then use XSLT to get the MARC-XML to MODS.

Easy enough to then put that all in a script in the language of your choice to automate the whole thing.

--Dave

[1] http://www.indexdata.com/yaz/doc/yaz-marcdump.html
[2] http://www.indexdata.com/yaz

==================
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu
________________________________________
From: Next generation catalogs for libraries [NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of MJ Suhonos [mj_at_SUHONOS.CA]
Sent: Tuesday, April 20, 2010 6:39 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] After MARC...MODS?

Thank you, Ross, for the "red herring" comment — I was beginning to get very discouraged by this thread; the MARC-vs-MODS debate, interesting as it is, seems symptomatic of other issues.

Besides, LoC has already summarized (many of) the issues for us:
http://www.loc.gov/standards/mods/mods-overview.html

From my perspective, the conversation seems to go roughly like this:

Cataloguers: We want something better than MARC.
LoC: Okay, did you have anything specific in mind?
Cataloguers: Not really, just.. you know, better.
LoC: Okay, here you go <produces MODS>
Cataloguers: No, no, that's not good enough.  <insert reason du jour>
LoC: … ?!?!

(This is obviously simplified, but in my defence, MARC was conceived before I was, so I'm a bit late to the game.)

A couple of particular statements that I'd like to comment on:

> My reasons for being suspicious of MODS are because it STILL holds too closely to MARC, it's basically just a slightly prettified MARC.   It doesn't allow one to do _very_ much more than MARC does

I disagree — as a programmer/hacker who has almost zero knowledge of MARC (245? 700? 123? XYZ? What?), MODS lets me do some very useful things without having to learn a new language.  Especially for the reasons Eric and Ross mention regarding ISBD/AACR2 and parsing (though, Jonathan, as a hacker you must know this firsthand :-).

And don't take my word for it, just look at some of the CSL and Bibutils work being done by non-librarians with MODS:

http://code.haskell.org/citeproc-hs/
http://www.scripps.edu/~cdputnam/software/bibutils/

> I think the fundamental issue is that people want the coding to be "human-readable" and a well-trained cataloger is not considered "human." :-)
>
> Can you envision catalogers talking MODS to the same effect and
> efficiency they are now talking MARC? Efficiency matters.

There's the problem: cataloguers are *forced* to learn how to "talk MARC" and thus become superhuman.  MODS, in contrast, is readable by "mere" humans.  Why do we require cataloguers who aren't doing under-the-hood system maintenance to understand MARC?

For those who think MODS "doesn't go far enough": when I showed the cataloguers in my department a slide of MARCXML, they could generally wrap their heads around it.  When I showed the same record in MODS, their reaction was more mixed, and less temperate.

So, if you mean "go far enough to change many cataloguers' thinking", MODS is pretty radical.  If you mean "go far enough to break from the hierarchical record concept", then no.  But then, I'm not we'd want to use MODS for that anyway.

When I see, eg:  http://www.loc.gov/marc/bibliographic/bd20x24x.html — I see a crosswalk.  A cataloguer with item-in-hand wants to enter metadata for "Uniform Title" based on the rules of practice they have.  Why should they know or even care whether it's coded internally as 240 or <titleInfo type="uniform">?

And of course this is where the tools issue comes in:

> If you in your old tool put in this and that field with these
> sub-fields and values, it should be trivial for any tool to read it
> (it's supposed to be machine readable, no?) and parse it and put it
> into any back-end model you want.
>
> Still, these are all issues that take place behind the scenes. There would be little or no reason to change anyone's cataloging interface very much at all.

I couldn't agree more — a large part of the problem is that the tools are built so poorly that they don't successfully separate "cataloguing rules" from the underlying encoding format.  We saw this as a huge cultural issue when trying to build a cataloguing tool designed for different cultures (libraries, archives, museums) and thus, different descriptive rules, but with an underlying crosswalkable data store.  The librarians insisted on "seeing" MARC; the archivists insisted on "seeing" EAD, and so on.

And lastly:

> It does surprise me somewhat that all you smart folks don't get
> together and create a framework for washing and cleaning up MARC
> records (making it convertible to whatever else you want or need)

I think what you're looking for is this:
http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-3.xsl

Obviously that's somewhat tongue-in-cheek, but if you look at sheer amount of logic in that stylesheet (and bear in mind that it includes more from the "MARC slim Utils"), you'll see that a significant amount of "clean up" work is involved in crosswalking from MARC to MODS.

Case in point:  I have about 10M MARC21 records that I want to try some "interesting stuff" with.  The very first thing I need to do in order to a) use modern tools and b) make it readable for myself (a programmer and librarian, but not a trained cataloguer) is convert it to MODS.  After struggling for many hours yesterday with MARC::XML and MARC4J (the only tools available to me), I'm still stuck.  The data format sucks and the tools to free it into something modern also suck.

If I was starting with MODS instead, I'd have a much larger set of tools to manipulate the XML, and tinker with ways to break it into linked data, crosswalk into Dublin Core, etc.  (Aside: I'd like to serialize MODS into JSON but unfortunately it's too tied to XML to do this easily; love to hear ideas from anyone who might know how to do this.)

I guess my point after all this ranting is that by our single criterion "better than MARC", MODS succeeds wildly.  It's imperfect, absolutely; and by design it still inherits MARC-like concepts, but we are richer for having it as a tool.  Why aren't we using it?  If it's not "good enough", then we have to be specific about what criteria *would* make it "good enough", and for what purposes.

MJ