Jonathan - You are right on!
On 9/16/09 12:33 PM, "Jonathan Rochkind" <rochkind_at_JHU.EDU> wrote:
> Just because someone is a "Google programmer" does not mean they can work
> magic with data that isn't there.
and
> Certainly it's not the encoding format that's a problem, of course it's
> no problem converting to MARC-XML or whatever you want. It's the schema
> -- the way the data elements are either there or not, and how they are
> 'tagged' for recognition.
On 9/16/09 12:33 PM, "Jonathan Rochkind" <rochkind_at_JHU.EDU> wrote:
> Jacobs, Jane W wrote:
>>
>> I'd argue that it wouldn't be hard at all. To Flip OCLC records to MARC
>> XML and grab what they want and enjoying munching it six ways to Sunday
>> should all be child's play for the average Google programmer!
>>
>
> As someone who routinely tries to do that as part of my job, and who
> _comes_ from a library background and spends quite a bit of time trying
> to learn MARC and the conventions for putting things in MARC (including
> but not limited to AACR2), this has simply not been my experience. Just
> because someone is a "Google programmer" does not mean they can work
> magic with data that isn't there.
>
> Certainly it's not the encoding format that's a problem, of course it's
> no problem converting to MARC-XML or whatever you want. It's the schema
> -- the way the data elements are either there or not, and how they are
> 'tagged' for recognition.
>
> Frequently the answer to "How do I get this piece of data I want" is
> along the lines of: "Well, it'll be in this field, UNLESS this other
> field is X, in which case it'll be in field Y, UNLESS field Y is being
> used for Z (to try to and figure out if Z look at fixed fields a, b, and
> c, the different combinations of all three of which determine that, but
> there's no guarantee they're filled out correct). Oh, and that's
> assuming it's a post-1972 record, in older records they did things
> entirely differently and put the data over in field N. Oh, and ALL of
> that is assuming this is AACR2 data, the corpus also includes Rare Books
> and Manuscripts data, and those guys do things entirely differently,
> although it's still in MARC, you've got to look in this OTHER field for
> it. First check fixed field q to see if it's RBM data, and hope fixed
> field q is right. Oh, and don't forget to check if it's encoded in UTF-8
> or MARC-8 by checking this other fixed field, which we know is wrong
> most of the time."
>
> I am seriously barely exagerating. The amount of knowledge you need to
> get all but the simplest data out of typical US library MARC is huge,
> this knowledge is documented in a half dozen (at least) different
> places, when it's documented at all and not just in 'cataloger
> tradition', and even once you DO figure out what's going on---the answer
> is often that exactly what you want isn't there. (How do I know if an
> 856 represents a full text link or just table of contents or just a
> publisher's site offering to sell me the book, again? Cause I'd really
> like to. How do I get machine-interpretable serials coverage
> information, so my software can answer the question of whether we hold
> 1980 volume 20 issue 5 and if so in what location? With the majority of
> US libraries data, I simply can't.)
>
> Jonathan
--
Heather Christenson
Mass Digitization Project Manager
University of California
California Digital Library
http://www.cdlib.org/inside/projects/massdig/
Received on Thu Sep 17 2009 - 19:00:12 EDT