Re: opac live search

From: Weinheimer Jim <j.weinheimer_at_nyob> Date: Sun, 1 Mar 2009 17:13:50 +0100 To: NGC4LIB_at_LISTSERV.ND.EDU

Tim Spalding wrote:
> I fundamentally disagree. I really need to write something about this
> some day--how and why many librarians love the "semantic web."
> Fundamentally, and with respect, I think librarian interest in the
> semantic web, is effectively about learning as *little* as
> possible—learning nothing from Google, particularly. The semantic web
> feels familiar because it is or seems authoritative, controlled,
> top-down, binary/certain, standards-driven, committee-based, highly
> ordered and in opposition to the "mess" that proven so amazingly
> useful. The semantic web is a sort of "web do-over," the web as
> envisioned by librarians. As a matter of development, the idea keeps
> getting smaller and smaller—from a sort of AI utopia to "linked data"
> and "microformats."

I am surprised that you are against the semantic web concept and I will be interested in more details of your opinions. To me, the semantic web is not about doing things the way we have always done them, i.e. "authoritative, controlled, top-down, binary/certain, standards-driven, committee-based, highly ordered" but it rather attempts to let people get some sort of reliable results from their searches. It seems to me that even semi-serious research is all about finding reliable groups of materials related conceptually in different ways, such as from my previous example, "the memoirs of U.S. soldiers who fought in WWI." I think this is a realistic search and is nothing strange. 

What do people believe they are retrieving when they search that kind of phrase in Google? Perhaps people really don't care about the results when they do a search like this and they are satisfied with anything that they comes up, but that is certainly not my experience and appears to go against any serious definition of research. So, for those people who actually care about the results for the materials they are searching for, do they realize that doing a Google search is like putting a pair of dice into a cup, shaking it and watching what comes out? Do they actually believe that the 500,000 or so hits that they get really are what is available in Google concerning "the memoirs of U.S. soldiers who fought in WWI?" And that the top hits really are the most "relevant" to their topic? 

I think lots of people, if not most, really do believe this, but it is very important for us to understand that they are definitely not getting that type of result and to try to let our users know this as well. Library information literacy classes talk a lot about this. Google is a black box that throws out all kinds of results. It doesn't mean that the results are bad, but are they so reliable for genuine research that other methods can be ignored and thrown out? It seems to me that if we want to even keep open the possibliity of doing research--real research and not just accepting what comes out of the Google one-armed bandit, the only option is to get some kind of control over this material. I concede that extending traditional library methods is simply unrealistic..

The semantic web approach is the only one that I have seen that takes all of this seriously. The library-sponsored programs that I have seen confine themselves to explaining what libraries have always done. On the other hand, the Semantic Web first, recognizes that there is a problem, and second, attempts to solve it. Traditional library methods that are "authoritative, controlled, top down, etc." may offer some guidance but are not satisfactory for this task. I think the goal of the Semantic Web offers an excellent guide for libraries to follow to remain relevant in the coming information universe.

>Take the book numbers and imagine work is really only done once. If
>each title took a full hour to catalog, at 40-hours-52-weeks you'd
>need only 72 librarians to catalog all 150,000 books produced last
>year. How many catalogers are there in the United States anyway? If
>every book took ten hours, you'd only need 720. I'm guessing there are
>more than 720 too. 

When I started cataloging, they figured 1 hour for an original catalog record and this included full NACO authority headings, which could take quite a bit of time. I also worked with cyrillic-based languages, which demand more time. Now though, it's more like ½ hour. But as you point out, even if everybody had to do a record every 10 minutes, the production as a whole has to be organized so that not everyone is working on the same materials, but each is working on something different, otherwise productivity remains stifled.

Still, it is more important that catalogers change their worldview: their universe should no longer be the local catalog and the materials inside their own, local library. It's already changed for our users, especially the younger ones, and I believe for most reference librarians. Since the information universe has changed, materials outside the local cataloged collection must be dealt with as seriously as the materials within our catalogs: not only the journal indexes and full-text materials, which are just as much a part of the local collection as a book on the shelves (and except in the earliest days, never included in the catalog at an analytical level), but the fabulous materials online through the digital projects around the world. Our users want these things and we must respond somehow.

Although a case can be made that a university's particular copy of a book cataloged by another library must be reviewed because there can be subtle but highly important changes from one physical copy to another, when we consider online materials, where everyone is looking at precisely the same item, it makes no sense for each library to create catalog records---even through copy cataloging--and maintain them. (I could even argue it makes no sense to select them separately but that is another discussion) 

For me, from this point on, the question turns into one of workflow: what is the most efficient way of creating high-quality records (however that is defined) that can be shared by all concerned? Obviously, this involves standards, new shared, computer systems, and cooperation from many groups and fields that libraries have not cooperated with before, from publishers to secretaries to scholars and researchers. This involves massive change and I fear may mean too much change.

Many catalogers, including myself, have wanted to ignore these materials outside the local collection for legitimate reasons (e.g. they disappear and change without notice) but that is the part that I believe is no longer sustainable. Our worldview must change along with the world, and if the worldview of our users has changed to include materials on the internet, so must ours.

Will libraries be able to adapt to this "new" world? (It's already been around for almost 10 years and can hardly be called "new" anymore!) My personal optimism waxes and wanes, and right now it is waning. I see no real push in the library world for methods and projects that seriously deal with this reality. RDA only defines more precisely what we currently do and certainly does not discuss the transformations in metadata use or creation. Even the recent "Statement of cataloguing principles" has very little that is new, still insisting that users want to find works, expressions, manifestations, items through authors, titles and subjects, something completely out of step with modern research and does not correspond with what users actually do (and I would suspect that it is not what the authors of the statement really do either). There is however, under 4. Objectives and Functions of the Catalogue:

4.5.to navigate within a catalogue and beyond

which at least mentions "and beyond"--two little words that hold enormous discussions and possibilities behind them. A hopeful sign, but a very small one.

Jim Weinheimer