Jacobs, Jane W wrote:
>
> I'd argue that it wouldn't be hard at all. To Flip OCLC records to MARC
> XML and grab what they want and enjoying munching it six ways to Sunday
> should all be child's play for the average Google programmer!
>
As someone who routinely tries to do that as part of my job, and who
_comes_ from a library background and spends quite a bit of time trying
to learn MARC and the conventions for putting things in MARC (including
but not limited to AACR2), this has simply not been my experience. Just
because someone is a "Google programmer" does not mean they can work
magic with data that isn't there.
Certainly it's not the encoding format that's a problem, of course it's
no problem converting to MARC-XML or whatever you want. It's the schema
-- the way the data elements are either there or not, and how they are
'tagged' for recognition.
Frequently the answer to "How do I get this piece of data I want" is
along the lines of: "Well, it'll be in this field, UNLESS this other
field is X, in which case it'll be in field Y, UNLESS field Y is being
used for Z (to try to and figure out if Z look at fixed fields a, b, and
c, the different combinations of all three of which determine that, but
there's no guarantee they're filled out correct). Oh, and that's
assuming it's a post-1972 record, in older records they did things
entirely differently and put the data over in field N. Oh, and ALL of
that is assuming this is AACR2 data, the corpus also includes Rare Books
and Manuscripts data, and those guys do things entirely differently,
although it's still in MARC, you've got to look in this OTHER field for
it. First check fixed field q to see if it's RBM data, and hope fixed
field q is right. Oh, and don't forget to check if it's encoded in UTF-8
or MARC-8 by checking this other fixed field, which we know is wrong
most of the time."
I am seriously barely exagerating. The amount of knowledge you need to
get all but the simplest data out of typical US library MARC is huge,
this knowledge is documented in a half dozen (at least) different
places, when it's documented at all and not just in 'cataloger
tradition', and even once you DO figure out what's going on---the answer
is often that exactly what you want isn't there. (How do I know if an
856 represents a full text link or just table of contents or just a
publisher's site offering to sell me the book, again? Cause I'd really
like to. How do I get machine-interpretable serials coverage
information, so my software can answer the question of whether we hold
1980 volume 20 issue 5 and if so in what location? With the majority of
US libraries data, I simply can't.)
Jonathan
Received on Wed Sep 16 2009 - 15:35:01 EDT