Re: Why don't non-librarians value library data as highly as we do?

From: B.G. Sloan <bgsloan2_at_nyob> Date: Wed, 16 Sep 2009 21:11:00 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Way back in the olden days I studied cataloging for a full semester under the tutelage of Michael Gorman, shortly after he had finished editing AACR2. It was a valuable experience.

A few years later I was writing some code (as well as writing supporting documentation for library catalogers) to translate MARC records into the format used for the University of Illinois' Library Computer System (LCS).

I wholeheartedly agree with Jonathan Rochkind's take on processing AACR2 and MARC records.

Bernie Sloan

--- On Wed, 9/16/09, Jonathan Rochkind <rochkind_at_JHU.EDU> wrote:

> From: Jonathan Rochkind <rochkind_at_JHU.EDU>
> Subject: Re: [NGC4LIB] Why don't non-librarians value library data as highly as we do?
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Wednesday, September 16, 2009, 3:33 PM
> Jacobs, Jane W wrote:
> > 
> > I'd argue that it wouldn't be hard at all.  To
> Flip OCLC records to MARC
> > XML and grab what they want and enjoying munching it
> six ways to Sunday
> > should all be child's play for the average Google
> programmer!
> >   
> 
> As someone who routinely tries to do that as part of my
> job, and who _comes_ from a library background and spends
> quite a bit of time trying to learn MARC and the conventions
> for putting things in MARC (including but not limited to
> AACR2), this has simply not been my experience.  Just
> because someone is a "Google programmer" does not mean they
> can work magic with data that isn't there.
> 
> Certainly it's not the encoding format that's a problem, of
> course it's no problem converting to MARC-XML or whatever
> you want.  It's the schema -- the way the data elements
> are either there or not, and how they are 'tagged' for
> recognition.
> 
> Frequently the answer to "How do I get this piece of data I
> want" is along the lines of: "Well, it'll be in this field,
> UNLESS this other field is X, in which case it'll be in
> field Y, UNLESS field Y is being used for Z (to try to and
> figure out if Z look at fixed fields a, b, and c, the
> different combinations of all three of which determine that,
> but there's no guarantee they're filled out correct). 
> Oh, and that's assuming it's a post-1972 record, in older
> records they did things entirely differently and put the
> data over in field N.  Oh, and ALL of that is assuming
> this is AACR2 data, the corpus also includes Rare Books and
> Manuscripts data, and those guys do things entirely
> differently, although it's still in MARC, you've got to look
> in this OTHER field for it.  First check fixed field q
> to see if it's RBM data, and hope fixed field q is right.
> Oh, and don't forget to check if it's encoded in UTF-8 or
> MARC-8 by checking this other fixed field, which we know is
> wrong most of the time."
> 
> I am seriously barely exagerating.  The amount of
> knowledge you need to get all but the simplest data out of
> typical US library MARC is huge, this knowledge is
> documented in a half dozen (at least) different places, when
> it's documented at all and not just in 'cataloger
> tradition', and even once you DO figure out what's going
> on---the answer is often that exactly what you want isn't
> there.  (How do I know if an 856 represents a full text
> link or just table of contents or just a publisher's site
> offering to sell me the book, again? Cause I'd really like
> to. How do I get machine-interpretable serials coverage
> information, so my software can answer the question of
> whether we hold 1980 volume 20 issue 5 and if so in what
> location?  With the majority of US libraries data, I
> simply can't.)
> 
> Jonathan
>