Re: Google Magicians?

From: Trish Culkin <trish.culkin_at_nyob> Date: Mon, 21 Sep 2009 15:20:05 -0600 To: NGC4LIB_at_LISTSERV.ND.EDU

Well, some of the mainstream press has chimed in as you have shown, Bernie

-- Nunberg in the Chronicle http://chronicle
.com/article/Googles-Book-Search-A/48245/

-- Grafton in the New Yorker
http://www.newyorker.com/online/blogs/books/2009/09/google-books-and-the-judge.html

-- the Economist Business section
http://www.economist.com/displaystory.cfm?story_id=14376406

but I agree that gnashing of teeth in the blogosphere won't carry the day.

Maybe we need volunteers to write directly to Jon Orwant at Google, with
copies to the Economist, the NY Times or other high viewership media place.
Who knows who at ALA or any of its spinoffs -- can they be approached to
rally sentiment? Maybe with a web-based write-in campaign, and some Twitter
action, with some careful exposition that laymen can relate to? or how about
at least a thoughtful and persuasive article in LJ or PW.

On Mon, Sep 21, 2009 at 1:25 PM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:

>
> Trish Culkin said: "...we'd be better served by continuing the pressure on
> Google to 1) understand it and then 2) use it."
>
> So who's pressuring Google to do this?
>
> Bernie Sloan
>
> --- On Mon, 9/21/09, Trish Culkin <trish.culkin_at_GMAIL.COM> wrote:
>
>
> From: Trish Culkin <trish.culkin_at_GMAIL.COM>
> Subject: Re: [NGC4LIB] Google Magicians?
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Monday, September 21, 2009, 2:25 PM
>
>
> Tarring the data "trapped" in the MARC format seems an oversimplification.
> I know the format is a bear, and that not all MARC records are created
> equal,  but in general MARC is literally a resource without peer.
>
> Setting aside  questions of classification and LCSH vs BISAC, it's hard to
> argue that "juried" MARC records -- those coming from LC, OCLC and major
> academic and public libraries -- do not in general contain good descriptive
> cataloging -- i.e.  accurate representation of authorship, place and date
> of
> publication, edition, language, etc.  These descriptive facets were the
> first focus of Geoffrey Nunnerg's Google slam (e.g. all the bad dates) and
> it seems counterproductive to argue that that these good MARC records not
> worth matching correctly to Google digital editions.
>
> For the record, I also believe that computer manipulation of the
> classification embedded in MARC (both Dewey and LC) and of LC Name and
> Subject Headings combined with a  good authority file, would add value to
> both casual search and retrieval as well as to rigorous scholarly work.
> Combined with tagging, text analysis, user-participatory efforts and great
> graphics, the potential for using computers to help the world understand
> its
> intellectual heritage is tremendous.
>
> Bottom line, the information contained in MARC records from established
> sources represents a 200-year heritage of good-faith professional effort to
> describe intellectual works and place them in intellectual context. It's
> not
> only the best we have, it's all we have, and rather than discard it in the
> hopes that "better" data will emanate from somewhere, we'd be better served
> by continuing the pressure on Google to 1) understand it and then 2) use
> it.
>
>
>
>
> On Mon, Sep 21, 2009 at 10:34 AM, Jonathan Rochkind <rochkind_at_jhu.edu
> >wrote:
>
> > I completely understand the power of good metadata.  I know a decent
> (just
> > decent, admitted) amount about MARC and AACR2 due to excellent
> preperation
> > in library school in classes from Alysson Carlyle, and a three year
> career
> > of spending significant time talking to catalogers, reading about
> > cataloging, working with MARC and AACR2 data, and reading cataloging
> > standards. Sure, I don't know as much as an expert cataloger with 20
> years
> > experience, but I'm not a babe in the woods.
> >
> > I still find it very difficult to get all but the most trivial data out
> of
> > our _actual_ in practice MARC corpuses, in a way that will actually be
> > consistent and useful to the users.
> > I know dozens of people who agree with me, including catalogers,
> catalogers
> > with decades of experience (talk to Diane Hillman, I don't think anyone
> can
> > say she doesn't understand cataloging or respect good metadata), and
> around
> > 10 people who have posted to this list.   Certainly reasonable people can
> > disagree though, sure.
> >
> > I resent this being portrayed as a debate between those who understand
> the
> > power of good metadata and those who don't. I understand the power of
> good
> > metadata, I just wish we had more of it.
> >
> > Jonathan
> >
> >
> > Trish Culkin wrote:
> >
> >> I think it *IS *more difficult that it should be, and hence more
> >> expensive,
> >> to convince system designers and software engineers to work with the
> >> intricacies and embedded intelligence of AACR2/MARC Meta data.  In over
> 25
> >> years of managing crews of developers in two different ILS companies, I
> >> found that their tendency was always to "rethink" or "reinvent", or at
> >> least
> >> "simply" the application and use of MARC data, and this is likely true
> at
> >> Google today.
> >>
> >> This was probably originally an off-shoot of the "not invented here"
> >> syndrome, but now I think it's more a matter of AACR2/MARC's complexity
> >> not
> >> being transparent and not easily succumbing to manipulation by standard
> >> tools. Developers typically expect the data to fit into more traditional
> >> (and simpler) data-models, and it's hard to entice them (or their
> business
> >> managers)  into deconstructing another universe prior to writing new
> >> applications.
> >>
> >> This is notwithstanding Jane's description of currently available
> options
> >> for manipulating data -- the use and value is obvious to those in the
> >> library trade, but not so much outside this venue and it kind of makes
> her
> >> Catch 22  point: "... those who have cataloging/bibliographic knowledge
> >> lack
> >> computing knowledge/server space. Those who have computing
> >> knowledge/server
> >> space probably lack cataloging/bibliographic knowledge."
> >>
> >> If the objective is to use this data to its fullest potential, and if
> past
> >> experience is any indicator, it will require a mix of  pressure from
> >> skilled
> >> users, informed persistence from inside and outside Google to counter
> >> profit
> >> objectives, and many iterations to achieve something approximating
> >> responsible use.
> >>
> >> I'm not sure whether it's sad or validating to watch this struggle
> between
> >> those who understand the power of good meta data struggle with those who
> >> have the skills to make best use of it. Both, I guess.
> >>
> >>
> >> On Mon, Sep 21, 2009 at 9:39 AM, Jacobs, Jane W <
> >> Jane.W.Jacobs_at_queenslibrary.org> wrote:
> >>
> >>
> >>
> >>> Jonathan Rochkind Wrote:
> >>>
> >>>
> >>>
> >>>> All I can say is that I and every other programmer in libraries that I
> >>>>>
> >>>>>
> >>>> know that has tried to work with AACR2/MARC metadata has found that it
> >>> is not nearly as simple as you say to identify data elements of
> >>> interest.   Despite our familiarity with the relevant standards, such
> as
> >>>
> >>> they are.
> >>>
> >>> ...
> >>>
> >>>
> >>>
> >>>> All I can
> >>>>>
> >>>>>
> >>>> say is the only people I know that think "it should be easy to get
> >>> whatever data you want out of library MARC" are people who aren't
> >>> programmers who have tried.
> >>>
> >>> I'm not much of a programmer, but using the open-source Perl module,
> >>> developed by REAL programmers (really GOOD programmers, I would add.)
> >>> I've managed to pull out pretty much everything what I needed.  On the
> >>> rare occasions when we needed and were able to hire a real programmer
> >>> the results were excellent.
> >>>
> >>> If I were a real programmer and didn't want to dip into the Perl module
> >>> to grab what I wanted, I would probably want to use XML, there are
> >>> already programs to convert MARC to MARC-XML.  MARC-XML is pretty
> >>> verbose and cludgey in terms of taking up space on your servers but if
> >>> you have plenty server space to stash it on it's no problem.  Grabbing
> >>> things out of XML, even the cludgey MARC kind is quite easy, as long as
> >>> you know where you're grabbing from.
> >>>
> >>> Ironically those who have cataloging/bibliographic knowledge lack
> >>> computing knowledge/server space. Those who have computing
> >>> knowledge/server space probably lack cataloging/bibliographic
> knowledge.
> >>> Catch-22!
> >>>
> >>> However on the following point I expect you're totally correct!
> >>>
> >>>
> >>>
> >>>> Google may have much more resources than any one of our libraries do,
> >>>>
> >>>>
> >>> but they still choose to expend them or not based on cost benefit.  I
> >>> still suspect Google's estimate of the 'cost' is higher than you think
> >>> it is, AND that their estimate of the 'benefit' of using library data
> is
> >>>
> >>> lower than you think it is.
> >>>
> >>> JJ
> >>>
> >>>
> >>> **Views expressed by the author do not necessarily represent those of
> >>> the Queens Library.**
> >>>
> >>> Jane Jacobs
> >>> Asst. Coord., Catalog Division
> >>> Queens Borough Public Library
> >>> 89-11 Merrick Blvd.
> >>> Jamaica, NY 11432
> >>> tel.: (718) 990-0804
> >>> e-mail: Jane.W.Jacobs_at_queenslibrary.org
> >>> FAX. (718) 990-8566
> >>>
> >>>
> >>>
> >>> The information contained in this message may be privileged and
> >>> confidential and protected from disclosure. If the reader of this
> message
> >>> is
> >>> not the intended recipient, or an employee or agent responsible for
> >>> delivering this message to the intended recipient, you are hereby
> >>> notified
> >>> that any dissemination, distribution or copying of this communication
> is
> >>> strictly prohibited. If you have received this communication in error,
> >>> please notify us immediately by replying to the message and deleting it
> >>> from
> >>> your computer.
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
> >
>
>
> --
> Trish
>
>
>
>
>

-- 
Trish