Well it sounds like the opportunity for a modern day "retrospective conversion" project. Talk about job security for catalogers. If there were an accepted standard XML format and we could employ an army of librarians who have the knowledge you describe below to re-catalog all our records into the new format, we could do what?
Hmmm
Rhonda
--- On Wed, 9/16/09, Jonathan Rochkind <rochkind_at_JHU.EDU> wrote:
> From: Jonathan Rochkind <rochkind_at_JHU.EDU>
> Subject: Re: [NGC4LIB] Why don't non-librarians value library data as highly as we do?
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Wednesday, September 16, 2009, 3:33 PM
> Jacobs, Jane W wrote:
> >
> > I'd argue that it wouldn't be hard at all. To
> Flip OCLC records to MARC
> > XML and grab what they want and enjoying munching it
> six ways to Sunday
> > should all be child's play for the average Google
> programmer!
> >
>
> As someone who routinely tries to do that as part of my
> job, and who _comes_ from a library background and spends
> quite a bit of time trying to learn MARC and the conventions
> for putting things in MARC (including but not limited to
> AACR2), this has simply not been my experience. Just
> because someone is a "Google programmer" does not mean they
> can work magic with data that isn't there.
>
> Certainly it's not the encoding format that's a problem, of
> course it's no problem converting to MARC-XML or whatever
> you want. It's the schema -- the way the data elements
> are either there or not, and how they are 'tagged' for
> recognition.
>
> Frequently the answer to "How do I get this piece of data I
> want" is along the lines of: "Well, it'll be in this field,
> UNLESS this other field is X, in which case it'll be in
> field Y, UNLESS field Y is being used for Z (to try to and
> figure out if Z look at fixed fields a, b, and c, the
> different combinations of all three of which determine that,
> but there's no guarantee they're filled out correct).
> Oh, and that's assuming it's a post-1972 record, in older
> records they did things entirely differently and put the
> data over in field N. Oh, and ALL of that is assuming
> this is AACR2 data, the corpus also includes Rare Books and
> Manuscripts data, and those guys do things entirely
> differently, although it's still in MARC, you've got to look
> in this OTHER field for it. First check fixed field q
> to see if it's RBM data, and hope fixed field q is right.
> Oh, and don't forget to check if it's encoded in UTF-8 or
> MARC-8 by checking this other fixed field, which we know is
> wrong most of the time."
>
> I am seriously barely exagerating. The amount of
> knowledge you need to get all but the simplest data out of
> typical US library MARC is huge, this knowledge is
> documented in a half dozen (at least) different places, when
> it's documented at all and not just in 'cataloger
> tradition', and even once you DO figure out what's going
> on---the answer is often that exactly what you want isn't
> there. (How do I know if an 856 represents a full text
> link or just table of contents or just a publisher's site
> offering to sell me the book, again? Cause I'd really like
> to. How do I get machine-interpretable serials coverage
> information, so my software can answer the question of
> whether we hold 1980 volume 20 issue 5 and if so in what
> location? With the majority of US libraries data, I
> simply can't.)
>
> Jonathan
>
***********************************
Received on Fri Sep 18 2009 - 11:25:47 EDT