Re: Why don't non-librarians value library data as highly as we do?

From: Heather Christenson <heather.christenson_at_nyob> Date: Thu, 17 Sep 2009 15:58:34 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Jonathan - You are right on!

On 9/16/09 12:33 PM, "Jonathan Rochkind" <rochkind_at_JHU.EDU> wrote:

> Just because someone is a "Google programmer" does not mean they can work
> magic with data that isn't there.

and

> Certainly it's not the encoding format that's a problem, of course it's
> no problem converting to MARC-XML or whatever you want.  It's the schema
> -- the way the data elements are either there or not, and how they are
> 'tagged' for recognition.

On 9/16/09 12:33 PM, "Jonathan Rochkind" <rochkind_at_JHU.EDU> wrote:

> Jacobs, Jane W wrote:
>> 
>> I'd argue that it wouldn't be hard at all.  To Flip OCLC records to MARC
>> XML and grab what they want and enjoying munching it six ways to Sunday
>> should all be child's play for the average Google programmer!
>>   
> 
> As someone who routinely tries to do that as part of my job, and who
> _comes_ from a library background and spends quite a bit of time trying
> to learn MARC and the conventions for putting things in MARC (including
> but not limited to AACR2), this has simply not been my experience.  Just
> because someone is a "Google programmer" does not mean they can work
> magic with data that isn't there.
> 
> Certainly it's not the encoding format that's a problem, of course it's
> no problem converting to MARC-XML or whatever you want.  It's the schema
> -- the way the data elements are either there or not, and how they are
> 'tagged' for recognition.
> 
> Frequently the answer to "How do I get this piece of data I want" is
> along the lines of: "Well, it'll be in this field, UNLESS this other
> field is X, in which case it'll be in field Y, UNLESS field Y is being
> used for Z (to try to and figure out if Z look at fixed fields a, b, and
> c, the different combinations of all three of which determine that, but
> there's no guarantee they're filled out correct).  Oh, and that's
> assuming it's a post-1972 record, in older records they did things
> entirely differently and put the data over in field N.  Oh, and ALL of
> that is assuming this is AACR2 data, the corpus also includes Rare Books
> and Manuscripts data, and those guys do things entirely differently,
> although it's still in MARC, you've got to look in this OTHER field for
> it.  First check fixed field q to see if it's RBM data, and hope fixed
> field q is right. Oh, and don't forget to check if it's encoded in UTF-8
> or MARC-8 by checking this other fixed field, which we know is wrong
> most of the time."
> 
> I am seriously barely exagerating.  The amount of knowledge you need to
> get all but the simplest data out of typical US library MARC is huge,
> this knowledge is documented in a half dozen (at least) different
> places, when it's documented at all and not just in 'cataloger
> tradition', and even once you DO figure out what's going on---the answer
> is often that exactly what you want isn't there.  (How do I know if an
> 856 represents a full text link or just table of contents or just a
> publisher's site offering to sell me the book, again? Cause I'd really
> like to. How do I get machine-interpretable serials coverage
> information, so my software can answer the question of whether we hold
> 1980 volume 20 issue 5 and if so in what location?  With the majority of
> US libraries data, I simply can't.)
> 
> Jonathan

-- 
Heather Christenson
Mass Digitization Project Manager
University of California
California Digital Library
http://www.cdlib.org/inside/projects/massdig/