Re: Why don't non-librarians value library data as highly as we do?

From: Sharon Foster <fostersm1_at_nyob> Date: Thu, 17 Sep 2009 09:08:57 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I'm not a cataloger, but I was a software engineer. MARC/AACR2 parsing
is complex, and that's all assuming that the data really do follow the
standards. What if the original cataloger made a mistake or
misunderstood a portion of one of the standards?

Sharon M. Foster, JD, MLS
Technology Librarian
http://firstgentrekkie.blogspot.com/
"Have you tried switching it off and on again?"

On Thu, Sep 17, 2009 at 12:11 AM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:
> Way back in the olden days I studied cataloging for a full semester under the tutelage of Michael Gorman, shortly after he had finished editing AACR2. It was a valuable experience.
>
> A few years later I was writing some code (as well as writing supporting documentation for library catalogers) to translate MARC records into the format used for the University of Illinois' Library Computer System (LCS).
>
> I wholeheartedly agree with Jonathan Rochkind's take on processing AACR2 and MARC records.
>
> Bernie Sloan
>
> --- On Wed, 9/16/09, Jonathan Rochkind <rochkind_at_JHU.EDU> wrote:
>
>> From: Jonathan Rochkind <rochkind_at_JHU.EDU>
>> Subject: Re: [NGC4LIB] Why don't non-librarians value library data as highly as we do?
>> To: NGC4LIB_at_LISTSERV.ND.EDU
>> Date: Wednesday, September 16, 2009, 3:33 PM
>> Jacobs, Jane W wrote:
>> >
>> > I'd argue that it wouldn't be hard at all.  To
>> Flip OCLC records to MARC
>> > XML and grab what they want and enjoying munching it
>> six ways to Sunday
>> > should all be child's play for the average Google
>> programmer!
>> >
>>
>> As someone who routinely tries to do that as part of my
>> job, and who _comes_ from a library background and spends
>> quite a bit of time trying to learn MARC and the conventions
>> for putting things in MARC (including but not limited to
>> AACR2), this has simply not been my experience.  Just
>> because someone is a "Google programmer" does not mean they
>> can work magic with data that isn't there.
>>
>> Certainly it's not the encoding format that's a problem, of
>> course it's no problem converting to MARC-XML or whatever
>> you want.  It's the schema -- the way the data elements
>> are either there or not, and how they are 'tagged' for
>> recognition.
>>
>> Frequently the answer to "How do I get this piece of data I
>> want" is along the lines of: "Well, it'll be in this field,
>> UNLESS this other field is X, in which case it'll be in
>> field Y, UNLESS field Y is being used for Z (to try to and
>> figure out if Z look at fixed fields a, b, and c, the
>> different combinations of all three of which determine that,
>> but there's no guarantee they're filled out correct).
>> Oh, and that's assuming it's a post-1972 record, in older
>> records they did things entirely differently and put the
>> data over in field N.  Oh, and ALL of that is assuming
>> this is AACR2 data, the corpus also includes Rare Books and
>> Manuscripts data, and those guys do things entirely
>> differently, although it's still in MARC, you've got to look
>> in this OTHER field for it.  First check fixed field q
>> to see if it's RBM data, and hope fixed field q is right.
>> Oh, and don't forget to check if it's encoded in UTF-8 or
>> MARC-8 by checking this other fixed field, which we know is
>> wrong most of the time."
>>
>> I am seriously barely exagerating.  The amount of
>> knowledge you need to get all but the simplest data out of
>> typical US library MARC is huge, this knowledge is
>> documented in a half dozen (at least) different places, when
>> it's documented at all and not just in 'cataloger
>> tradition', and even once you DO figure out what's going
>> on---the answer is often that exactly what you want isn't
>> there.  (How do I know if an 856 represents a full text
>> link or just table of contents or just a publisher's site
>> offering to sell me the book, again? Cause I'd really like
>> to. How do I get machine-interpretable serials coverage
>> information, so my software can answer the question of
>> whether we hold 1980 volume 20 issue 5 and if so in what
>> location?  With the majority of US libraries data, I
>> simply can't.)
>>
>> Jonathan
>>
>
>
>
>
>