Re: Google Magicians?

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Mon, 21 Sep 2009 12:34:48 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
I completely understand the power of good metadata.  I know a decent 
(just decent, admitted) amount about MARC and AACR2 due to excellent 
preperation in library school in classes from Alysson Carlyle, and a 
three year career of spending significant time talking to catalogers, 
reading about cataloging, working with MARC and AACR2 data, and reading 
cataloging standards. Sure, I don't know as much as an expert cataloger 
with 20 years experience, but I'm not a babe in the woods.

I still find it very difficult to get all but the most trivial data out 
of our _actual_ in practice MARC corpuses, in a way that will actually 
be consistent and useful to the users. 

I know dozens of people who agree with me, including catalogers, 
catalogers with decades of experience (talk to Diane Hillman, I don't 
think anyone can say she doesn't understand cataloging or respect good 
metadata), and around 10 people who have posted to this list.   
Certainly reasonable people can disagree though, sure.

I resent this being portrayed as a debate between those who understand 
the power of good metadata and those who don't. I understand the power 
of good metadata, I just wish we had more of it.

Jonathan

Trish Culkin wrote:
> I think it *IS *more difficult that it should be, and hence more expensive,
> to convince system designers and software engineers to work with the
> intricacies and embedded intelligence of AACR2/MARC Meta data.  In over 25
> years of managing crews of developers in two different ILS companies, I
> found that their tendency was always to "rethink" or "reinvent", or at least
> "simply" the application and use of MARC data, and this is likely true at
> Google today.
>
> This was probably originally an off-shoot of the "not invented here"
> syndrome, but now I think it's more a matter of AACR2/MARC's complexity not
> being transparent and not easily succumbing to manipulation by standard
> tools. Developers typically expect the data to fit into more traditional
> (and simpler) data-models, and it's hard to entice them (or their business
> managers)  into deconstructing another universe prior to writing new
> applications.
>
> This is notwithstanding Jane's description of currently available options
> for manipulating data -- the use and value is obvious to those in the
> library trade, but not so much outside this venue and it kind of makes her
> Catch 22  point: "... those who have cataloging/bibliographic knowledge lack
> computing knowledge/server space. Those who have computing knowledge/server
> space probably lack cataloging/bibliographic knowledge."
>
> If the objective is to use this data to its fullest potential, and if past
> experience is any indicator, it will require a mix of  pressure from skilled
> users, informed persistence from inside and outside Google to counter profit
> objectives, and many iterations to achieve something approximating
> responsible use.
>
> I'm not sure whether it's sad or validating to watch this struggle between
> those who understand the power of good meta data struggle with those who
> have the skills to make best use of it. Both, I guess.
>
>
> On Mon, Sep 21, 2009 at 9:39 AM, Jacobs, Jane W <
> Jane.W.Jacobs_at_queenslibrary.org> wrote:
>
>   
>> Jonathan Rochkind Wrote:
>>
>>     
>>>> All I can say is that I and every other programmer in libraries that I
>>>>         
>> know that has tried to work with AACR2/MARC metadata has found that it
>> is not nearly as simple as you say to identify data elements of
>> interest.   Despite our familiarity with the relevant standards, such as
>>
>> they are.
>>
>> ...
>>
>>     
>>>> All I can
>>>>         
>> say is the only people I know that think "it should be easy to get
>> whatever data you want out of library MARC" are people who aren't
>> programmers who have tried.
>>
>> I'm not much of a programmer, but using the open-source Perl module,
>> developed by REAL programmers (really GOOD programmers, I would add.)
>> I've managed to pull out pretty much everything what I needed.  On the
>> rare occasions when we needed and were able to hire a real programmer
>> the results were excellent.
>>
>> If I were a real programmer and didn't want to dip into the Perl module
>> to grab what I wanted, I would probably want to use XML, there are
>> already programs to convert MARC to MARC-XML.  MARC-XML is pretty
>> verbose and cludgey in terms of taking up space on your servers but if
>> you have plenty server space to stash it on it's no problem.  Grabbing
>> things out of XML, even the cludgey MARC kind is quite easy, as long as
>> you know where you're grabbing from.
>>
>> Ironically those who have cataloging/bibliographic knowledge lack
>> computing knowledge/server space. Those who have computing
>> knowledge/server space probably lack cataloging/bibliographic knowledge.
>> Catch-22!
>>
>> However on the following point I expect you're totally correct!
>>
>>     
>>> Google may have much more resources than any one of our libraries do,
>>>       
>> but they still choose to expend them or not based on cost benefit.  I
>> still suspect Google's estimate of the 'cost' is higher than you think
>> it is, AND that their estimate of the 'benefit' of using library data is
>>
>> lower than you think it is.
>>
>> JJ
>>
>>
>> **Views expressed by the author do not necessarily represent those of
>> the Queens Library.**
>>
>> Jane Jacobs
>> Asst. Coord., Catalog Division
>> Queens Borough Public Library
>> 89-11 Merrick Blvd.
>> Jamaica, NY 11432
>> tel.: (718) 990-0804
>> e-mail: Jane.W.Jacobs_at_queenslibrary.org
>> FAX. (718) 990-8566
>>
>>
>>
>> The information contained in this message may be privileged and
>> confidential and protected from disclosure. If the reader of this message is
>> not the intended recipient, or an employee or agent responsible for
>> delivering this message to the intended recipient, you are hereby notified
>> that any dissemination, distribution or copying of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please notify us immediately by replying to the message and deleting it from
>> your computer.
>>
>>     
>
>
>
>   
Received on Mon Sep 21 2009 - 12:39:54 EDT