Re: An article to warm the hearts of cataloguers

From: Rinne, Nathan (ESC) <RinneN_at_nyob> Date: Thu, 10 Sep 2009 09:38:24 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Karen,

First of all, please respond OCLC!  

Second, being the curmudgeon again, I think it is appropriate we ask
what's in this deal for OCLC? (besides: libraries might actually
survive).  What's in it for serious library users (for whom the
satisficing is never enough)?  Maybe, just maybe, OCLC won't give Google
all the metadata until they a) have given their member libraries a
chance to share their voice in this matter and b) have Google's
assurance that libraries/librarians (through OCLC) will have a
*guaranteed and substantial* voice (i.e. contractual agreement) in how
that metadata is used?

When it comes to the valuable treasure of privately and publicly funded
metadata that librarians have created over the years, do we really want
Google to be freely unchecked and duty-bound to no one?  Free to use the
metadata however they want? (like they have been doing so far - do we
really think if we gave them everything without strings that then they
would be careful to do all the things catalogers/scholars think they
should do?  What about "to him who has been faithful with a little...")
I can see the advantages of this (I use Google Book Search for all kinds
of stuff and find it useful for all kinds of things libraries can't give
me), but what might the disadvantages be (think "law of unintended
consequences")?

No good guys and bad guys here.  Just thinking about the "balance of
powers" idea.  

That said, I again appeal to OCLC to reply again.  Even a slightly more
forthcoming response would be helpful.  

Regards, 

Nathan Rinne

Media Cataloging Technician

Educational Service Center

11200 93rd Avenue North

Maple Grove MN. 55369

Email: rinnen_at_district279.org

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
Sent: Thursday, September 10, 2009 4:46 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers

Well, that very carefully avoids answering the question. I will repeat  
the question, clearly:

Are there restrictions on what MARC fields and subfields Google can  
include in its metadata?

kc

Quoting "Goldner,Matt" <goldnerm_at_OCLC.ORG>:

> I am posting this on behalf of my colleague Chip Nilges in regards   
> to Google Books and OCLC.
> Matt Goldner
>
> Matthew R. Goldner
> Product & Technology Advocate
> OCLC
>
>
> I wanted to clarify what OCLC is doing with WorldCat and Google   
> Books.  We've made the entire WorldCat database (excluding certain   
> metadata records that OCLC is contractually prohibited from   
> providing) available to Google to support discovery of the books   
> Google has digitized from library collections.  In exchange, Google   
> has agreed to display a link to libraries on pages describing   
> library digitized materials.  Google is also providing OCLC with the  
>  metadata needed to represent, in WorldCat,  all of the library   
> materials Google is digitizing.
>
> Our focus in structuring the agreement was to support the interests   
> of our members, who wanted WorldCat records to be used for their   
> digitized collections in Google.  We also wanted to ensure that   
> libraries were present as a choice on the pages describing their   
> digitized content.
>
> We continue to work with Google.  We expect the relationship to   
> evolve to meet the needs of our members, and we are listening   
> closely to these discussions.
>
> Chip Nilges
> Vice President, Business Development
> OCLC
>
> -----Original Message-----
> From: Next generation catalogs for libraries   
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
> Sent: Monday, September 07, 2009 11:56 AM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers
>
> I was at the meeting where Nunberg presented this, and Dan Clancy of
> Google responded there as well. I believe that what Google says about
> using library data is only partially true. There are perfectly good
> library records for most of the books that Google has scanned -- after
> all, those books come from academic libraries. Take a book with
terrible
> metadata and look it up in one of the participating library catalogs
--
> you usually find good metadata. One commenter says:
>
> "But I cannot see why Google could not license Library of Congress and
> OCLC catalogs to improve the metadata."
>
> Obviously, Google should be able to use the data from the
participating
> libraries, and, if the items are at LC, it could retrieve the record
> from LC's database without restrictions. I know this is possible
because
> the Open Library is using this technique to retrieve bibliographic
data
> for the public domain books scanned by Google.
>
> However, Google *does* have a contract with OCLC, as OCLC is
supposedly
> creating a WorldCat record for each book digitized by Google. I
believe
> that OCLC wishes to include the Google holdings in WorldCat, and the
> holding library will be Google. If you remember the early Google Books
> data, it was minimal, and did not include subject headings. A reliable
> Google source once dropped an aside during a discussion of how bad
that
> early metadata was (and why weren't they keeping the whole MARC record
> that they got from the libraries): OCLC won't let us. Unfortunately,
> there is no way to know if OCLC has a contract with Google, nor what
it
> says. I asked this question directly of Dan Clancy at the meeting when
> Nunberg spoke, and Clancy did not answer. My question was: "Do you
have
> a contract with OCLC? And does it restrict what data you can use?"
>
> To add to that, I have some odd... clues. If you look at a record on
GBS
> and the same record in the providing library, the subject headings
> follow a pattern:
>
> GBS:
>   Indians of North America
>   Indian baskets
>
> Library:
>   Indians of North America -- Languages.
>   Indians of North America -- California
>   Indian baskets -- North America
>
> This is the same pattern that appeared in the records released by the
> University of Michigan for their public domain scanned books -- only
the
> $a of the 6XX field was included. (I wrote about this:
> http://kcoyle.blogspot.com/2008/05/amputation.html). Many other fields
> are also excluded from those records. You can see here a post from
> someone at Michigan to this very list:
>   http://serials.infomotions.com/ngc4lib/archive/2008/200805/0676.html
>
> In other words, we have shot ourselves in the foot by not allowing
> Google to make use of the metadata created by libraries.
> Given that OCLC is a membership organization, the members should be
able
> to ask about any agreement between OCLC and Google, and should be able
> to instruct OCLC to release the full records to Google. Without the
full
> records, any effort to coordinate GBS and library catalogs is going to
> be very difficult, and will, in the end, cost libraries considerable
> time and effort. It will also make the Google institutional product
> (should the settlement be blessed by the court) much less useful to
> libraries. This is all so nonsensical that I cannot understand WHY the
> participating libraries have agreed to it, yet it couldn't be the way
it
> is without their agreement... and without their silence.
>
> kc
>
> James Weinheimer wrote:
>> I wrote this on Autocat, and thought that readers of this list might
be
>> interested as well. JW
>>
>> Now that I have read the entire article
>> http://languagelog.ldc.upenn.edu/nll/?p=1701 and the indepth response
from
>> Google http://languagelog.ldc.upenn.edu/nll/?p=1701#comment-41758, I
must
>> say that I think (more probably, I *hope*) that this may be the
beginning of
>> one of the most important discussions on cataloging and "metadata"
today,
>> and perhaps of all time. The importance comes not so much from what
they
>> say--which is rather elementary--but from the importance that the
>> non-library community places on these issues, and even more
importantly,
>> this discussion is taking place not within the dusty pages of some
forgotten
>> issue of a library journal or on a closed specialist listserv, but on
an
>> open, important scholarly website (not a library website) and replied
to by
>> the most important information company in the world. This could be a
moment
>> for librarians, and especially catalogers (who are the experts in any
case),
>> to take advantage of a soap box that may be temporary. But we can't
be too
>> technical or overwhelming in our arguments.
>>
>> Some observations:
>> 1) it looks as if one of my predictions for the future is already
outdated.
>> I had predicted that "all metadata" would be thrown together into a
single
>> database somewhere resulting in a huge mess. According to the fellow
from
>> Google, this has happened already since he mentions metadata they
have taken
>> from Brazil, Armenia, Korea and a few other places. It is interesting
that
>> no one anywhere discusses this in terms of "rules" or "standards" but
as an
>> Armenian database, or a Brazilian database, instead of an "AACR2
database"
>> or "ISBD" or German or French or Italian rules, or whatever. Perhaps
when
>> the discussion is being led by non-expert metadata creators, this
should not
>> be surprising. (For the sake of clarification, in this discussion,
there is
>> an expert metadata *user* (a professor) and an expert metadata
*aggregator*
>> (the fellow from Google), but no expert metadata *creator*)
>>
>> 2) A lot of the errors that the Google fellow blamed on libraries
make me
>> skeptical, to say the least. As one example, he mentions, "Geoff
identifies
>> a topology text (I assume this is Curvature and Betti Numbers) as
belonging
>> to Didactic Poetry; this beaut comes to us from an aggregator of
library
>> catalogs. Perhaps the subject heading "Differential Geometry" was
next to it
>> in an alphabetic list, and a cataloger chose wrong."
>>
>> Sorry, but I can't buy that one. While catalogers certainly make lots
of
>> mistakes, they make certain types of mistakes, and these types are
quite
>> different from mistakes made by a computer. Unless this subject was
assigned
>> by a human with no understanding of the English language (perhaps a
>> secretary in Korea who does not understand English), then this is,
without a
>> doubt, a computer mistake.
>>
>> 3) The fellow from Google points out some other human mistakes that
are
>> highly interesting and that we should consider at length. All of the
>> problems pointed out are rather elementary, but we know there are
problems
>> in cataloging that are truly difficult. How are corporate bodies
handled?
>> Uniform titles? Anonymous works? Pseudonyms?
>>
>> 4) Taken as a whole, it appears that the general public considers
that
>> "metadata quality" is important, which is absolutely great and
something
>> that we must capitalize upon. But the comments make it imperative
that we
>> see the problems with metadata today not only in terms of our own
>> collections or our own communities, but how to make bibliographic
metadata
>> in general interoperable and coherent among all metadata creators in
all
>> communities on a world-wide scale. Google is forcing the issue.
>>
>> How will "human expert-created" metadata work in an environment
similar to
>> Google Books? I still think people will want to search one database
(just
>> like they do Google) and this initial search will almost always be a
>> full-text keyword search on a corpus of text. The metadata we make
will
>> allow for clickable limits, similar to how it works in Koha and
WorldCat now
>> (of course, they don't work with full-text and only the catalog
records).
>> See, e.g. in the Athens County Public Library how the headings are
extracted
>> from the records retrieved in the multiple display so that users can
narrow
>> their results:
>>
http://acpl.kohalibrary.com/cgi-bin/koha/opac-search.pl?q=roman+archaeol
ogy.
>> In a new system including full-text, this method could be expanded
>> indefinitely to add automatically extracted keywords, Web2.0-type
results
>> (ratings, suggestions by others) and other limits.
>>
>> Therefore, I don't think people will be browsing subjects or name
headings
>> in their initial searches. In the "limits" there may be some browsing
>> performed. Therefore, how could these types of browsing be made the
most
>> useful with multiple rules, forms of names, and the problems
mentioned in
>> the Language Log post?
>>
>> This is the environment we are entering. It we expect everybody to
follow
>> AACR2 and/or RDA we are simply being unrealistic. Instead of creating
new
>> rules that 1/10 of one percent of the world will use, we should be
focusing
>> our energies on making what we have now more useful and coherent.
From the
>> message and comments on Language Log, it seems as if our public wants
this,
>> and even Google itself seems to be taking these things seriously.
>>
>> This new world is largely unknown and we must feel our way along,
especially
>> in this difficult economic climate. ISBD was a great beginning that
we can
>> and should build upon, but today we must look beyond the library
community
>> to everyone in the same field. This is happening whether we like it
or not.
>> Google is forcing our hand by throwing everything in together.
>>
>> James Weinheimer  j.weinheimer_at_aur.edu
>> Director of Library and Information Services
>> The American University of Rome
>> Rome, Italy
>>
>>
>>
>
>
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>