Re: An article to warm the hearts of cataloguers

From: Karen Coyle <lists_at_nyob> Date: Thu, 10 Sep 2009 02:46:06 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Well, that very carefully avoids answering the question. I will repeat  
the question, clearly:

Are there restrictions on what MARC fields and subfields Google can  
include in its metadata?

kc

Quoting "Goldner,Matt" <goldnerm_at_OCLC.ORG>:

> I am posting this on behalf of my colleague Chip Nilges in regards   
> to Google Books and OCLC.
> Matt Goldner
>
> Matthew R. Goldner
> Product & Technology Advocate
> OCLC
>
>
> I wanted to clarify what OCLC is doing with WorldCat and Google   
> Books.  We've made the entire WorldCat database (excluding certain   
> metadata records that OCLC is contractually prohibited from   
> providing) available to Google to support discovery of the books   
> Google has digitized from library collections.  In exchange, Google   
> has agreed to display a link to libraries on pages describing   
> library digitized materials.  Google is also providing OCLC with the  
>  metadata needed to represent, in WorldCat,  all of the library   
> materials Google is digitizing.
>
> Our focus in structuring the agreement was to support the interests   
> of our members, who wanted WorldCat records to be used for their   
> digitized collections in Google.  We also wanted to ensure that   
> libraries were present as a choice on the pages describing their   
> digitized content.
>
> We continue to work with Google.  We expect the relationship to   
> evolve to meet the needs of our members, and we are listening   
> closely to these discussions.
>
> Chip Nilges
> Vice President, Business Development
> OCLC
>
> -----Original Message-----
> From: Next generation catalogs for libraries   
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
> Sent: Monday, September 07, 2009 11:56 AM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] An article to warm the hearts of cataloguers
>
> I was at the meeting where Nunberg presented this, and Dan Clancy of
> Google responded there as well. I believe that what Google says about
> using library data is only partially true. There are perfectly good
> library records for most of the books that Google has scanned -- after
> all, those books come from academic libraries. Take a book with terrible
> metadata and look it up in one of the participating library catalogs --
> you usually find good metadata. One commenter says:
>
> "But I cannot see why Google could not license Library of Congress and
> OCLC catalogs to improve the metadata."
>
> Obviously, Google should be able to use the data from the participating
> libraries, and, if the items are at LC, it could retrieve the record
> from LC's database without restrictions. I know this is possible because
> the Open Library is using this technique to retrieve bibliographic data
> for the public domain books scanned by Google.
>
> However, Google *does* have a contract with OCLC, as OCLC is supposedly
> creating a WorldCat record for each book digitized by Google. I believe
> that OCLC wishes to include the Google holdings in WorldCat, and the
> holding library will be Google. If you remember the early Google Books
> data, it was minimal, and did not include subject headings. A reliable
> Google source once dropped an aside during a discussion of how bad that
> early metadata was (and why weren't they keeping the whole MARC record
> that they got from the libraries): OCLC won't let us. Unfortunately,
> there is no way to know if OCLC has a contract with Google, nor what it
> says. I asked this question directly of Dan Clancy at the meeting when
> Nunberg spoke, and Clancy did not answer. My question was: "Do you have
> a contract with OCLC? And does it restrict what data you can use?"
>
> To add to that, I have some odd... clues. If you look at a record on GBS
> and the same record in the providing library, the subject headings
> follow a pattern:
>
> GBS:
>   Indians of North America
>   Indian baskets
>
> Library:
>   Indians of North America -- Languages.
>   Indians of North America -- California
>   Indian baskets -- North America
>
> This is the same pattern that appeared in the records released by the
> University of Michigan for their public domain scanned books -- only the
> $a of the 6XX field was included. (I wrote about this:
> http://kcoyle.blogspot.com/2008/05/amputation.html). Many other fields
> are also excluded from those records. You can see here a post from
> someone at Michigan to this very list:
>   http://serials.infomotions.com/ngc4lib/archive/2008/200805/0676.html
>
> In other words, we have shot ourselves in the foot by not allowing
> Google to make use of the metadata created by libraries.
> Given that OCLC is a membership organization, the members should be able
> to ask about any agreement between OCLC and Google, and should be able
> to instruct OCLC to release the full records to Google. Without the full
> records, any effort to coordinate GBS and library catalogs is going to
> be very difficult, and will, in the end, cost libraries considerable
> time and effort. It will also make the Google institutional product
> (should the settlement be blessed by the court) much less useful to
> libraries. This is all so nonsensical that I cannot understand WHY the
> participating libraries have agreed to it, yet it couldn't be the way it
> is without their agreement... and without their silence.
>
> kc
>
> James Weinheimer wrote:
>> I wrote this on Autocat, and thought that readers of this list might be
>> interested as well. JW
>>
>> Now that I have read the entire article
>> http://languagelog.ldc.upenn.edu/nll/?p=1701 and the indepth response from
>> Google http://languagelog.ldc.upenn.edu/nll/?p=1701#comment-41758, I must
>> say that I think (more probably, I *hope*) that this may be the beginning of
>> one of the most important discussions on cataloging and "metadata" today,
>> and perhaps of all time. The importance comes not so much from what they
>> say--which is rather elementary--but from the importance that the
>> non-library community places on these issues, and even more importantly,
>> this discussion is taking place not within the dusty pages of some forgotten
>> issue of a library journal or on a closed specialist listserv, but on an
>> open, important scholarly website (not a library website) and replied to by
>> the most important information company in the world. This could be a moment
>> for librarians, and especially catalogers (who are the experts in any case),
>> to take advantage of a soap box that may be temporary. But we can't be too
>> technical or overwhelming in our arguments.
>>
>> Some observations:
>> 1) it looks as if one of my predictions for the future is already outdated.
>> I had predicted that "all metadata" would be thrown together into a single
>> database somewhere resulting in a huge mess. According to the fellow from
>> Google, this has happened already since he mentions metadata they have taken
>> from Brazil, Armenia, Korea and a few other places. It is interesting that
>> no one anywhere discusses this in terms of "rules" or "standards" but as an
>> Armenian database, or a Brazilian database, instead of an "AACR2 database"
>> or "ISBD" or German or French or Italian rules, or whatever. Perhaps when
>> the discussion is being led by non-expert metadata creators, this should not
>> be surprising. (For the sake of clarification, in this discussion, there is
>> an expert metadata *user* (a professor) and an expert metadata *aggregator*
>> (the fellow from Google), but no expert metadata *creator*)
>>
>> 2) A lot of the errors that the Google fellow blamed on libraries make me
>> skeptical, to say the least. As one example, he mentions, "Geoff identifies
>> a topology text (I assume this is Curvature and Betti Numbers) as belonging
>> to Didactic Poetry; this beaut comes to us from an aggregator of library
>> catalogs. Perhaps the subject heading "Differential Geometry" was next to it
>> in an alphabetic list, and a cataloger chose wrong."
>>
>> Sorry, but I can't buy that one. While catalogers certainly make lots of
>> mistakes, they make certain types of mistakes, and these types are quite
>> different from mistakes made by a computer. Unless this subject was assigned
>> by a human with no understanding of the English language (perhaps a
>> secretary in Korea who does not understand English), then this is, without a
>> doubt, a computer mistake.
>>
>> 3) The fellow from Google points out some other human mistakes that are
>> highly interesting and that we should consider at length. All of the
>> problems pointed out are rather elementary, but we know there are problems
>> in cataloging that are truly difficult. How are corporate bodies handled?
>> Uniform titles? Anonymous works? Pseudonyms?
>>
>> 4) Taken as a whole, it appears that the general public considers that
>> "metadata quality" is important, which is absolutely great and something
>> that we must capitalize upon. But the comments make it imperative that we
>> see the problems with metadata today not only in terms of our own
>> collections or our own communities, but how to make bibliographic metadata
>> in general interoperable and coherent among all metadata creators in all
>> communities on a world-wide scale. Google is forcing the issue.
>>
>> How will "human expert-created" metadata work in an environment similar to
>> Google Books? I still think people will want to search one database (just
>> like they do Google) and this initial search will almost always be a
>> full-text keyword search on a corpus of text. The metadata we make will
>> allow for clickable limits, similar to how it works in Koha and WorldCat now
>> (of course, they don't work with full-text and only the catalog records).
>> See, e.g. in the Athens County Public Library how the headings are extracted
>> from the records retrieved in the multiple display so that users can narrow
>> their results:
>> http://acpl.kohalibrary.com/cgi-bin/koha/opac-search.pl?q=roman+archaeology.
>> In a new system including full-text, this method could be expanded
>> indefinitely to add automatically extracted keywords, Web2.0-type results
>> (ratings, suggestions by others) and other limits.
>>
>> Therefore, I don't think people will be browsing subjects or name headings
>> in their initial searches. In the "limits" there may be some browsing
>> performed. Therefore, how could these types of browsing be made the most
>> useful with multiple rules, forms of names, and the problems mentioned in
>> the Language Log post?
>>
>> This is the environment we are entering. It we expect everybody to follow
>> AACR2 and/or RDA we are simply being unrealistic. Instead of creating new
>> rules that 1/10 of one percent of the world will use, we should be focusing
>> our energies on making what we have now more useful and coherent. From the
>> message and comments on Language Log, it seems as if our public wants this,
>> and even Google itself seems to be taking these things seriously.
>>
>> This new world is largely unknown and we must feel our way along, especially
>> in this difficult economic climate. ISBD was a great beginning that we can
>> and should build upon, but today we must look beyond the library community
>> to everyone in the same field. This is happening whether we like it or not.
>> Google is forcing our hand by throwing everything in together.
>>
>> James Weinheimer  j.weinheimer_at_aur.edu
>> Director of Library and Information Services
>> The American University of Rome
>> Rome, Italy
>>
>>
>>
>
>
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>