Re: An article to warm the hearts of cataloguers

From: Goldner,Matt <goldnerm_at_nyob> Date: Tue, 8 Sep 2009 19:57:30 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I am posting this on behalf of my colleague Chip Nilges in regards to Google Books and OCLC.
Matt Goldner

Matthew R. Goldner
Product & Technology Advocate
OCLC

I wanted to clarify what OCLC is doing with WorldCat and Google Books.  We've made the entire WorldCat database (excluding certain metadata records that OCLC is contractually prohibited from providing) available to Google to support discovery of the books Google has digitized from library collections.  In exchange, Google has agreed to display a link to libraries on pages describing library digitized materials.  Google is also providing OCLC with the metadata needed to represent, in WorldCat,  all of the library materials Google is digitizing.

Our focus in structuring the agreement was to support the interests of our members, who wanted WorldCat records to be used for their digitized collections in Google.  We also wanted to ensure that libraries were present as a choice on the pages describing their digitized content.  

We continue to work with Google.  We expect the relationship to evolve to meet the needs of our members, and we are listening closely to these discussions.  

Chip Nilges
Vice President, Business Development
OCLC

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
Sent: Monday, September 07, 2009 11:56 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers

I was at the meeting where Nunberg presented this, and Dan Clancy of 
Google responded there as well. I believe that what Google says about 
using library data is only partially true. There are perfectly good 
library records for most of the books that Google has scanned -- after 
all, those books come from academic libraries. Take a book with terrible 
metadata and look it up in one of the participating library catalogs -- 
you usually find good metadata. One commenter says:

"But I cannot see why Google could not license Library of Congress and 
OCLC catalogs to improve the metadata."

Obviously, Google should be able to use the data from the participating 
libraries, and, if the items are at LC, it could retrieve the record 
from LC's database without restrictions. I know this is possible because 
the Open Library is using this technique to retrieve bibliographic data 
for the public domain books scanned by Google.

However, Google *does* have a contract with OCLC, as OCLC is supposedly 
creating a WorldCat record for each book digitized by Google. I believe 
that OCLC wishes to include the Google holdings in WorldCat, and the 
holding library will be Google. If you remember the early Google Books 
data, it was minimal, and did not include subject headings. A reliable 
Google source once dropped an aside during a discussion of how bad that 
early metadata was (and why weren't they keeping the whole MARC record 
that they got from the libraries): OCLC won't let us. Unfortunately, 
there is no way to know if OCLC has a contract with Google, nor what it 
says. I asked this question directly of Dan Clancy at the meeting when 
Nunberg spoke, and Clancy did not answer. My question was: "Do you have 
a contract with OCLC? And does it restrict what data you can use?"

To add to that, I have some odd... clues. If you look at a record on GBS 
and the same record in the providing library, the subject headings 
follow a pattern:

GBS:
  Indians of North America
  Indian baskets

Library:
  Indians of North America -- Languages.
  Indians of North America -- California
  Indian baskets -- North America

This is the same pattern that appeared in the records released by the 
University of Michigan for their public domain scanned books -- only the 
$a of the 6XX field was included. (I wrote about this: 
http://kcoyle.blogspot.com/2008/05/amputation.html). Many other fields 
are also excluded from those records. You can see here a post from 
someone at Michigan to this very list:
  http://serials.infomotions.com/ngc4lib/archive/2008/200805/0676.html

In other words, we have shot ourselves in the foot by not allowing 
Google to make use of the metadata created by libraries.
Given that OCLC is a membership organization, the members should be able 
to ask about any agreement between OCLC and Google, and should be able 
to instruct OCLC to release the full records to Google. Without the full 
records, any effort to coordinate GBS and library catalogs is going to 
be very difficult, and will, in the end, cost libraries considerable 
time and effort. It will also make the Google institutional product 
(should the settlement be blessed by the court) much less useful to 
libraries. This is all so nonsensical that I cannot understand WHY the 
participating libraries have agreed to it, yet it couldn't be the way it 
is without their agreement... and without their silence.

kc

James Weinheimer wrote:
> I wrote this on Autocat, and thought that readers of this list might be
> interested as well. JW
>
> Now that I have read the entire article
> http://languagelog.ldc.upenn.edu/nll/?p=1701 and the indepth response from
> Google http://languagelog.ldc.upenn.edu/nll/?p=1701#comment-41758, I must
> say that I think (more probably, I *hope*) that this may be the beginning of
> one of the most important discussions on cataloging and "metadata" today,
> and perhaps of all time. The importance comes not so much from what they
> say--which is rather elementary--but from the importance that the
> non-library community places on these issues, and even more importantly,
> this discussion is taking place not within the dusty pages of some forgotten
> issue of a library journal or on a closed specialist listserv, but on an
> open, important scholarly website (not a library website) and replied to by
> the most important information company in the world. This could be a moment
> for librarians, and especially catalogers (who are the experts in any case),
> to take advantage of a soap box that may be temporary. But we can't be too
> technical or overwhelming in our arguments.
>
> Some observations:
> 1) it looks as if one of my predictions for the future is already outdated.
> I had predicted that "all metadata" would be thrown together into a single
> database somewhere resulting in a huge mess. According to the fellow from
> Google, this has happened already since he mentions metadata they have taken
> from Brazil, Armenia, Korea and a few other places. It is interesting that
> no one anywhere discusses this in terms of "rules" or "standards" but as an
> Armenian database, or a Brazilian database, instead of an "AACR2 database"
> or "ISBD" or German or French or Italian rules, or whatever. Perhaps when
> the discussion is being led by non-expert metadata creators, this should not
> be surprising. (For the sake of clarification, in this discussion, there is
> an expert metadata *user* (a professor) and an expert metadata *aggregator*
> (the fellow from Google), but no expert metadata *creator*)
>
> 2) A lot of the errors that the Google fellow blamed on libraries make me
> skeptical, to say the least. As one example, he mentions, "Geoff identifies
> a topology text (I assume this is Curvature and Betti Numbers) as belonging
> to Didactic Poetry; this beaut comes to us from an aggregator of library
> catalogs. Perhaps the subject heading "Differential Geometry" was next to it
> in an alphabetic list, and a cataloger chose wrong."
>
> Sorry, but I can't buy that one. While catalogers certainly make lots of
> mistakes, they make certain types of mistakes, and these types are quite
> different from mistakes made by a computer. Unless this subject was assigned
> by a human with no understanding of the English language (perhaps a
> secretary in Korea who does not understand English), then this is, without a
> doubt, a computer mistake.
>
> 3) The fellow from Google points out some other human mistakes that are
> highly interesting and that we should consider at length. All of the
> problems pointed out are rather elementary, but we know there are problems
> in cataloging that are truly difficult. How are corporate bodies handled?
> Uniform titles? Anonymous works? Pseudonyms?
>
> 4) Taken as a whole, it appears that the general public considers that
> "metadata quality" is important, which is absolutely great and something
> that we must capitalize upon. But the comments make it imperative that we
> see the problems with metadata today not only in terms of our own
> collections or our own communities, but how to make bibliographic metadata
> in general interoperable and coherent among all metadata creators in all
> communities on a world-wide scale. Google is forcing the issue.
>
> How will "human expert-created" metadata work in an environment similar to
> Google Books? I still think people will want to search one database (just
> like they do Google) and this initial search will almost always be a
> full-text keyword search on a corpus of text. The metadata we make will
> allow for clickable limits, similar to how it works in Koha and WorldCat now
> (of course, they don't work with full-text and only the catalog records).
> See, e.g. in the Athens County Public Library how the headings are extracted
> from the records retrieved in the multiple display so that users can narrow
> their results:
> http://acpl.kohalibrary.com/cgi-bin/koha/opac-search.pl?q=roman+archaeology.
> In a new system including full-text, this method could be expanded
> indefinitely to add automatically extracted keywords, Web2.0-type results
> (ratings, suggestions by others) and other limits.
>
> Therefore, I don't think people will be browsing subjects or name headings
> in their initial searches. In the "limits" there may be some browsing
> performed. Therefore, how could these types of browsing be made the most
> useful with multiple rules, forms of names, and the problems mentioned in
> the Language Log post?
>
> This is the environment we are entering. It we expect everybody to follow
> AACR2 and/or RDA we are simply being unrealistic. Instead of creating new
> rules that 1/10 of one percent of the world will use, we should be focusing
> our energies on making what we have now more useful and coherent. From the
> message and comments on Language Log, it seems as if our public wants this,
> and even Google itself seems to be taking these things seriously.
>
> This new world is largely unknown and we must feel our way along, especially
> in this difficult economic climate. ISBD was a great beginning that we can
> and should build upon, but today we must look beyond the library community
> to everyone in the same field. This is happening whether we like it or not.
> Google is forcing our hand by throwing everything in together.
>
> James Weinheimer  j.weinheimer_at_aur.edu
> Director of Library and Information Services
> The American University of Rome
> Rome, Italy
>
>
>   

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------