An article to warm the hearts of cataloguers

From: James Weinheimer <j.weinheimer_at_nyob> Date: Mon, 7 Sep 2009 08:34:42 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I wrote this on Autocat, and thought that readers of this list might be
interested as well. JW

Now that I have read the entire article
http://languagelog.ldc.upenn.edu/nll/?p=1701 and the indepth response from
Google http://languagelog.ldc.upenn.edu/nll/?p=1701#comment-41758, I must
say that I think (more probably, I *hope*) that this may be the beginning of
one of the most important discussions on cataloging and "metadata" today,
and perhaps of all time. The importance comes not so much from what they
say--which is rather elementary--but from the importance that the
non-library community places on these issues, and even more importantly,
this discussion is taking place not within the dusty pages of some forgotten
issue of a library journal or on a closed specialist listserv, but on an
open, important scholarly website (not a library website) and replied to by
the most important information company in the world. This could be a moment
for librarians, and especially catalogers (who are the experts in any case),
to take advantage of a soap box that may be temporary. But we can't be too
technical or overwhelming in our arguments.

Some observations:
1) it looks as if one of my predictions for the future is already outdated.
I had predicted that "all metadata" would be thrown together into a single
database somewhere resulting in a huge mess. According to the fellow from
Google, this has happened already since he mentions metadata they have taken
from Brazil, Armenia, Korea and a few other places. It is interesting that
no one anywhere discusses this in terms of "rules" or "standards" but as an
Armenian database, or a Brazilian database, instead of an "AACR2 database"
or "ISBD" or German or French or Italian rules, or whatever. Perhaps when
the discussion is being led by non-expert metadata creators, this should not
be surprising. (For the sake of clarification, in this discussion, there is
an expert metadata *user* (a professor) and an expert metadata *aggregator*
(the fellow from Google), but no expert metadata *creator*)

2) A lot of the errors that the Google fellow blamed on libraries make me
skeptical, to say the least. As one example, he mentions, "Geoff identifies
a topology text (I assume this is Curvature and Betti Numbers) as belonging
to Didactic Poetry; this beaut comes to us from an aggregator of library
catalogs. Perhaps the subject heading "Differential Geometry" was next to it
in an alphabetic list, and a cataloger chose wrong."

Sorry, but I can't buy that one. While catalogers certainly make lots of
mistakes, they make certain types of mistakes, and these types are quite
different from mistakes made by a computer. Unless this subject was assigned
by a human with no understanding of the English language (perhaps a
secretary in Korea who does not understand English), then this is, without a
doubt, a computer mistake.

3) The fellow from Google points out some other human mistakes that are
highly interesting and that we should consider at length. All of the
problems pointed out are rather elementary, but we know there are problems
in cataloging that are truly difficult. How are corporate bodies handled?
Uniform titles? Anonymous works? Pseudonyms?

4) Taken as a whole, it appears that the general public considers that
"metadata quality" is important, which is absolutely great and something
that we must capitalize upon. But the comments make it imperative that we
see the problems with metadata today not only in terms of our own
collections or our own communities, but how to make bibliographic metadata
in general interoperable and coherent among all metadata creators in all
communities on a world-wide scale. Google is forcing the issue.

How will "human expert-created" metadata work in an environment similar to
Google Books? I still think people will want to search one database (just
like they do Google) and this initial search will almost always be a
full-text keyword search on a corpus of text. The metadata we make will
allow for clickable limits, similar to how it works in Koha and WorldCat now
(of course, they don't work with full-text and only the catalog records).
See, e.g. in the Athens County Public Library how the headings are extracted
from the records retrieved in the multiple display so that users can narrow
their results:
http://acpl.kohalibrary.com/cgi-bin/koha/opac-search.pl?q=roman+archaeology.
In a new system including full-text, this method could be expanded
indefinitely to add automatically extracted keywords, Web2.0-type results
(ratings, suggestions by others) and other limits.

Therefore, I don't think people will be browsing subjects or name headings
in their initial searches. In the "limits" there may be some browsing
performed. Therefore, how could these types of browsing be made the most
useful with multiple rules, forms of names, and the problems mentioned in
the Language Log post?

This is the environment we are entering. It we expect everybody to follow
AACR2 and/or RDA we are simply being unrealistic. Instead of creating new
rules that 1/10 of one percent of the world will use, we should be focusing
our energies on making what we have now more useful and coherent. From the
message and comments on Language Log, it seems as if our public wants this,
and even Google itself seems to be taking these things seriously.

This new world is largely unknown and we must feel our way along, especially
in this difficult economic climate. ISBD was a great beginning that we can
and should build upon, but today we must look beyond the library community
to everyone in the same field. This is happening whether we like it or not.
Google is forcing our hand by throwing everything in together.

James Weinheimer  j.weinheimer_at_aur.edu
Director of Library and Information Services
The American University of Rome
Rome, Italy