Re: Why don't non-librarians value library data as highly as we do?

From: Jonathan Rochkind <rochkind_at_nyob> Date: Wed, 16 Sep 2009 15:33:32 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Jacobs, Jane W wrote:
>
> I'd argue that it wouldn't be hard at all.  To Flip OCLC records to MARC
> XML and grab what they want and enjoying munching it six ways to Sunday
> should all be child's play for the average Google programmer!
>   

As someone who routinely tries to do that as part of my job, and who 
_comes_ from a library background and spends quite a bit of time trying 
to learn MARC and the conventions for putting things in MARC (including 
but not limited to AACR2), this has simply not been my experience.  Just 
because someone is a "Google programmer" does not mean they can work 
magic with data that isn't there.

Certainly it's not the encoding format that's a problem, of course it's 
no problem converting to MARC-XML or whatever you want.  It's the schema 
-- the way the data elements are either there or not, and how they are 
'tagged' for recognition.

Frequently the answer to "How do I get this piece of data I want" is 
along the lines of: "Well, it'll be in this field, UNLESS this other 
field is X, in which case it'll be in field Y, UNLESS field Y is being 
used for Z (to try to and figure out if Z look at fixed fields a, b, and 
c, the different combinations of all three of which determine that, but 
there's no guarantee they're filled out correct).  Oh, and that's 
assuming it's a post-1972 record, in older records they did things 
entirely differently and put the data over in field N.  Oh, and ALL of 
that is assuming this is AACR2 data, the corpus also includes Rare Books 
and Manuscripts data, and those guys do things entirely differently, 
although it's still in MARC, you've got to look in this OTHER field for 
it.  First check fixed field q to see if it's RBM data, and hope fixed 
field q is right. Oh, and don't forget to check if it's encoded in UTF-8 
or MARC-8 by checking this other fixed field, which we know is wrong 
most of the time."

I am seriously barely exagerating.  The amount of knowledge you need to 
get all but the simplest data out of typical US library MARC is huge, 
this knowledge is documented in a half dozen (at least) different 
places, when it's documented at all and not just in 'cataloger 
tradition', and even once you DO figure out what's going on---the answer 
is often that exactly what you want isn't there.  (How do I know if an 
856 represents a full text link or just table of contents or just a 
publisher's site offering to sell me the book, again? Cause I'd really 
like to. How do I get machine-interpretable serials coverage 
information, so my software can answer the question of whether we hold 
1980 volume 20 issue 5 and if so in what location?  With the majority of 
US libraries data, I simply can't.)

Jonathan