Re: An article to warm the hearts of cataloguers

From: Goldner,Matt <goldnerm_at_nyob>
Date: Thu, 10 Sep 2009 12:50:06 -0400

Once again I'm posting on behalf of my colleague Chip Nilges.

Matt Goldner

To answer Karen's most recent post, Google can use any WC metadata field.  And it's important to note as well that our agreement with Google is not exclusive.  We're happy to work with others in the same way.  The goal, as I said in my original post, is to support the efforts of our members to bring their collections online, make them discoverable, and drive traffic to library services.  



-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
Sent: Thursday, September 10, 2009 5:46 AM
Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers

Well, that very carefully avoids answering the question. I will repeat  
the question, clearly:

Are there restrictions on what MARC fields and subfields Google can  
include in its metadata?


Quoting "Goldner,Matt" <goldnerm_at_OCLC.ORG>:

> I am posting this on behalf of my colleague Chip Nilges in regards   
> to Google Books and OCLC.
> Matt Goldner
> Matthew R. Goldner
> Product & Technology Advocate
> I wanted to clarify what OCLC is doing with WorldCat and Google   
> Books.  We've made the entire WorldCat database (excluding certain   
> metadata records that OCLC is contractually prohibited from   
> providing) available to Google to support discovery of the books   
> Google has digitized from library collections.  In exchange, Google   
> has agreed to display a link to libraries on pages describing   
> library digitized materials.  Google is also providing OCLC with the  
>  metadata needed to represent, in WorldCat,  all of the library   
> materials Google is digitizing.
> Our focus in structuring the agreement was to support the interests   
> of our members, who wanted WorldCat records to be used for their   
> digitized collections in Google.  We also wanted to ensure that   
> libraries were present as a choice on the pages describing their   
> digitized content.
> We continue to work with Google.  We expect the relationship to   
> evolve to meet the needs of our members, and we are listening   
> closely to these discussions.
> Chip Nilges
> Vice President, Business Development
> -----Original Message-----
> From: Next generation catalogs for libraries   
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
> Sent: Monday, September 07, 2009 11:56 AM
> Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers
> I was at the meeting where Nunberg presented this, and Dan Clancy of
> Google responded there as well. I believe that what Google says about
> using library data is only partially true. There are perfectly good
> library records for most of the books that Google has scanned -- after
> all, those books come from academic libraries. Take a book with terrible
> metadata and look it up in one of the participating library catalogs --
> you usually find good metadata. One commenter says:
> "But I cannot see why Google could not license Library of Congress and
> OCLC catalogs to improve the metadata."
> Obviously, Google should be able to use the data from the participating
> libraries, and, if the items are at LC, it could retrieve the record
> from LC's database without restrictions. I know this is possible because
> the Open Library is using this technique to retrieve bibliographic data
> for the public domain books scanned by Google.
> However, Google *does* have a contract with OCLC, as OCLC is supposedly
> creating a WorldCat record for each book digitized by Google. I believe
> that OCLC wishes to include the Google holdings in WorldCat, and the
> holding library will be Google. If you remember the early Google Books
> data, it was minimal, and did not include subject headings. A reliable
> Google source once dropped an aside during a discussion of how bad that
> early metadata was (and why weren't they keeping the whole MARC record
> that they got from the libraries): OCLC won't let us. Unfortunately,
> there is no way to know if OCLC has a contract with Google, nor what it
> says. I asked this question directly of Dan Clancy at the meeting when
> Nunberg spoke, and Clancy did not answer. My question was: "Do you have
> a contract with OCLC? And does it restrict what data you can use?"
> To add to that, I have some odd... clues. If you look at a record on GBS
> and the same record in the providing library, the subject headings
> follow a pattern:
> GBS:
>   Indians of North America
>   Indian baskets
> Library:
>   Indians of North America -- Languages.
>   Indians of North America -- California
>   Indian baskets -- North America
> This is the same pattern that appeared in the records released by the
> University of Michigan for their public domain scanned books -- only the
> $a of the 6XX field was included. (I wrote about this:
> Many other fields
> are also excluded from those records. You can see here a post from
> someone at Michigan to this very list:
> In other words, we have shot ourselves in the foot by not allowing
> Google to make use of the metadata created by libraries.
> Given that OCLC is a membership organization, the members should be able
> to ask about any agreement between OCLC and Google, and should be able
> to instruct OCLC to release the full records to Google. Without the full
> records, any effort to coordinate GBS and library catalogs is going to
> be very difficult, and will, in the end, cost libraries considerable
> time and effort. It will also make the Google institutional product
> (should the settlement be blessed by the court) much less useful to
> libraries. This is all so nonsensical that I cannot understand WHY the
> participating libraries have agreed to it, yet it couldn't be the way it
> is without their agreement... and without their silence.
> kc
> James Weinheimer wrote:
>> I wrote this on Autocat, and thought that readers of this list might be
>> interested as well. JW
>> Now that I have read the entire article
>> and the indepth response from
>> Google, I must
>> say that I think (more probably, I *hope*) that this may be the beginning of
>> one of the most important discussions on cataloging and "metadata" today,
>> and perhaps of all time. The importance comes not so much from what they
>> say--which is rather elementary--but from the importance that the
>> non-library community places on these issues, and even more importantly,
>> this discussion is taking place not within the dusty pages of some forgotten
>> issue of a library journal or on a closed specialist listserv, but on an
>> open, important scholarly website (not a library website) and replied to by
>> the most important information company in the world. This could be a moment
>> for librarians, and especially catalogers (who are the experts in any case),
>> to take advantage of a soap box that may be temporary. But we can't be too
>> technical or overwhelming in our arguments.
>> Some observations:
>> 1) it looks as if one of my predictions for the future is already outdated.
>> I had predicted that "all metadata" would be thrown together into a single
>> database somewhere resulting in a huge mess. According to the fellow from
>> Google, this has happened already since he mentions metadata they have taken
>> from Brazil, Armenia, Korea and a few other places. It is interesting that
>> no one anywhere discusses this in terms of "rules" or "standards" but as an
>> Armenian database, or a Brazilian database, instead of an "AACR2 database"
>> or "ISBD" or German or French or Italian rules, or whatever. Perhaps when
>> the discussion is being led by non-expert metadata creators, this should not
>> be surprising. (For the sake of clarification, in this discussion, there is
>> an expert metadata *user* (a professor) and an expert metadata *aggregator*
>> (the fellow from Google), but no expert metadata *creator*)
>> 2) A lot of the errors that the Google fellow blamed on libraries make me
>> skeptical, to say the least. As one example, he mentions, "Geoff identifies
>> a topology text (I assume this is Curvature and Betti Numbers) as belonging
>> to Didactic Poetry; this beaut comes to us from an aggregator of library
>> catalogs. Perhaps the subject heading "Differential Geometry" was next to it
>> in an alphabetic list, and a cataloger chose wrong."
>> Sorry, but I can't buy that one. While catalogers certainly make lots of
>> mistakes, they make certain types of mistakes, and these types are quite
>> different from mistakes made by a computer. Unless this subject was assigned
>> by a human with no understanding of the English language (perhaps a
>> secretary in Korea who does not understand English), then this is, without a
>> doubt, a computer mistake.
>> 3) The fellow from Google points out some other human mistakes that are
>> highly interesting and that we should consider at length. All of the
>> problems pointed out are rather elementary, but we know there are problems
>> in cataloging that are truly difficult. How are corporate bodies handled?
>> Uniform titles? Anonymous works? Pseudonyms?
>> 4) Taken as a whole, it appears that the general public considers that
>> "metadata quality" is important, which is absolutely great and something
>> that we must capitalize upon. But the comments make it imperative that we
>> see the problems with metadata today not only in terms of our own
>> collections or our own communities, but how to make bibliographic metadata
>> in general interoperable and coherent among all metadata creators in all
>> communities on a world-wide scale. Google is forcing the issue.
>> How will "human expert-created" metadata work in an environment similar to
>> Google Books? I still think people will want to search one database (just
>> like they do Google) and this initial search will almost always be a
>> full-text keyword search on a corpus of text. The metadata we make will
>> allow for clickable limits, similar to how it works in Koha and WorldCat now
>> (of course, they don't work with full-text and only the catalog records).
>> See, e.g. in the Athens County Public Library how the headings are extracted
>> from the records retrieved in the multiple display so that users can narrow
>> their results:
>> In a new system including full-text, this method could be expanded
>> indefinitely to add automatically extracted keywords, Web2.0-type results
>> (ratings, suggestions by others) and other limits.
>> Therefore, I don't think people will be browsing subjects or name headings
>> in their initial searches. In the "limits" there may be some browsing
>> performed. Therefore, how could these types of browsing be made the most
>> useful with multiple rules, forms of names, and the problems mentioned in
>> the Language Log post?
>> This is the environment we are entering. It we expect everybody to follow
>> AACR2 and/or RDA we are simply being unrealistic. Instead of creating new
>> rules that 1/10 of one percent of the world will use, we should be focusing
>> our energies on making what we have now more useful and coherent. From the
>> message and comments on Language Log, it seems as if our public wants this,
>> and even Google itself seems to be taking these things seriously.
>> This new world is largely unknown and we must feel our way along, especially
>> in this difficult economic climate. ISBD was a great beginning that we can
>> and should build upon, but today we must look beyond the library community
>> to everyone in the same field. This is happening whether we like it or not.
>> Google is forcing our hand by throwing everything in together.
>> James Weinheimer
>> Director of Library and Information Services
>> The American University of Rome
>> Rome, Italy
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
Received on Thu Sep 10 2009 - 12:52:36 EDT