Re: opac live search

From: Weinheimer Jim <j.weinheimer_at_nyob> Date: Tue, 24 Feb 2009 09:08:10 +0100 To: NGC4LIB_at_LISTSERV.ND.EDU

> First, let me apologize for my compressed and somewhat hostile tone. I
> appreciate your willingness to play ball even so. Ultimately, even if
> I disagreed with you on every point--which is not at all so--I would
> be grateful for the opportunity to discuss it thoughtfully with you.

No offense taken, Tim. I've always said that it would be a very boring place if everyone agreed. As Socrates proved so long ago, no single person has the truth, and it is only through argument and dialog that any steps toward truth can be made. I especially value your opinion since you come from a completely different perspective. But, I still think you can be wrong on occasion. :-)

> But again, Jim, this is false. You're not Googling these things and
> looking at the results. In fact, primary-source documents annotated
> with the term "World War I" are the rule online, not the exception.

Here you put your finger on what I am trying to say, and it is an example of something that a librarian like me takes for granted and so doesn't even mention, while a non-librarian doesn't see it. You use the term "annotated." In library-speak, this is called cataloging. In the library catalog, people go to a huge amount of trouble to "annotate" relevant materials in this way but in far more detail and in much more standardized ways, so that people can find the materials in a collection reliably in specific ways. It's expert system, which absolutely must change now that it's on the web but in any case, in a library catalog, you really can find "the set of all resources on World War I." Included in this is the caveat that "everything" should be not be seen in absolute terms, but within certain, known parameters and accepting some human error, the catalog allows you to do that. These sets must be maintained as well.

I've always thought that cataloging is one of the best examples of semiotics in action: you are dealing quite literally with the full array of language, all types of shades of meaning, and continuous language change. Terms that meant one thing at one time mean something else later or become completely meaningless. Catalogers can give you all kinds of examples: "moving pictures" "electric lamps, incandescent" to name only a couple. All these changes have serious consequences for the catalog.

And the above example, "World War I" can also be called "wwi" "ww1" "World War One," and people wrote about it even before it had a real name yet. When we include different languages with all their variant forms as well, it's breathtaking in its complexity just for this single concept. And it's extremely complex to create and maintain.

> It's library subjects—which used "World War, 1914-1918"—that have the problem, not the web.

I agree, but this is a problem with the *label,* not with the sets created and maintained by librarians. We bring together materials with similar concepts, or, our own version of one part of the Semantic Web. I still believe people want *reliable* results from the traditional access points: authors, titles, and subjects. The methods we use to achieve this were worked out in the 17th-19th centuries and although they have been elaborated upon, they haven't really been brought into the 21st century. It's time to do it.

> No primary sorces? Well, from the first ten links Link #5 is the
> "World War I Document Archive" from Brigham Young University, a
> massive collection of primary sources, helpfully subdivided into
> Conventions and Treaties, Official papers, Diaries, Image Archive,
> etc. Link #9 is the website "Eyewitness to history," which excepts a
> few dozen personal accounts of the war—a good introduction to some
> useful or interesting resources.
.

Yes, you can find materials on WWI, and many are wonderful, but this is not the point and what I try to tell my students and tell them that they are not searching the *concept* "wwi" but the *text* "wwi" which is not at all the same. Most p
eople believe, quite logically, that when they search "wwI" they are looking at the materials on "wwi" but they very definitely are not. I try to get them to ask: What are you looking at? Why is a certain site number 1 or number 50? Who decided that? Does this Google display give you a coherent overview of the different types of materials available to you over the web? And most importantly, what are you not seeing? Additionally, what are your assumptions? Are you assuming that for every resource about WWI and is "worthwhile" (whatever that means) someone out in the ether decided to "annotate" it with the string "World War I" and not "World War 1" or "wwi"? Out of the 10 zillion hits in Google, how many are not about world war i, and how many materials about world war i are you missing because nobody annotated it? Can you even know?

In a library catalog, there are controls that allow you to get a meaningful answer to these questions. For each resource that is, in the cataloger's opinion (not just anyone, but a trained professional whose work is very often reviewed by others) about WWI, they must use the string "World War, 1914-1918" and not "WWI" or they are fired. Out the door. In Google, there is nothing like this. Again, there is nothing necessarily wrong with the Google method, but one is definitely a black box that you cannot open.
>
> Turning back to libraries, I did a "Subject Search" on the Library of
> Congress for WWI and the first two hits are:
>
> 31 Eskadra Rozpoznawcza--World War II.
> Aerial gunnery--History--World War, 1939-1945.
>
> Only at three do I get a clue of what's going on:
>
> Aeronautics, Military--Germany--History--World War, 1914-1918.
>
> So, World War I isn't a valid subject time--I must use "World War,
> 1914-1918." But in this case, the "10,000 results" (a curiously
> round
> number!) are mostly of this form—"World War, 1914-198" stuck at the
> end of some other subect, like "Australia--Army--Recruiting,
> enlistment, etc." or "Dutch East Indies. Militaire
> Luchtvaart--History--."
>
> This list is, of course, far too much, even if it were possibe to
> remove World War II items. I couldn't figure out how to do a
> "left-anchored" search, so I would get only things that *start* with
> "World War, 1914-1918." If the system weren't so obviously borked, I'd
> be tempted to conclude that controlled vocabulary, like artificial
> intelligence in search, is an idea too far ahead of current
> technology.

But why do we consider a list in a library catalog consisting of a mere 10,000 hits or so to be too many, when a Google result routinely gives back millions of results? Certainly, there needs to be a lot of work done in a library catalog to display results (the most interesting was lcsh.info, but it was shut down. Bernhard's LCSH browser is also not bad), but I still maintain that people want these groupings that we make. When you get a semi-expert to put in a subject heading array for a book such as the following, I think it is useful in many ways:

Kniptash, Vernon E., 1897-1987 --Diaries.
United States. Army. Field Artillery, 150th.
United States. Army. Infantry Division, 42nd.
World War, 1914-1918 --Personal narratives, American.
Soldiers --United States --Diaries.
Germany --History --Allied occupation, 1918-1930.

Although it is obvious that these headings are card-based and designed for browsing left-anchored text strings, plus the labels are often weird ("Personal narratives, American"???), I still believe people will always want the materials organized under these groupings (i.e. people really do want the set of works about memoirs of U.S. citizens during WWI), although they will want to search for them in ways different from the old card catalog (this is essentially how the LC catalog works). Can some new methods of retrieval be dreamed up using this information and improving it? Absolutely! I'll bet you could come up with some good ones!

> To return to Sassoon and Owen, it's also worth remarking that many of
> their books lack WWI-related subject headings in the Library of
> Congress and other libraries. They're often just poets, and the
> granularity consensus of library cataloging makes it hard for
> individual poems on a topic to surface. The web does a better job of
> contextualizing their work, of splitting it into appropriate units,
> and of making it findable by its predominant subject.

Agreed, but it is my belief, based on my experience (and not only because I'm a librarian!) that people still want the library type of access, i.e. names, titles, and subjects, and in fact, people very often think they are getting this access in Google when they very definitely are not. What they are getting in Google when they search "wwi" is not bad, and is highly useful, but it is quite different and does not have the controls that traditional library access provide. I'm just saying that users should be aware of this. Librarians and catalogers need to work with full-text because people want it--including me. This is one of the many reasons why I say that the FRBR user tasks are completely obsolete and actually bizarre in today's world. The traditional and new methods should work together.

> Rather than a decline, I see an ascent. The web (Google, etc.) has
> taken the veil off the thing.

I don't see it as a decline either, but I tend to consider that where there are gains, there are normally corresponding losses. The resources through the web are great, but I don't consider that a real change--it's just more resources available to a lot more people than before. It's like everybody gaining access to the equivalent of a research library. That's cool, but nothing really different. Concerning the Web2.0 tools, I shall take the traditional library view that they are still too new to determine if they promise real changes and are not just a hula-hoop. Experiment, but be careful.

But the real change today is that there is a lot more asked from people and there are fewer "filters" than ever before. For better or worse, people relied on these filters, such as publishers, editors, and even librarians. My student users are interested in finding reliable, "truthful" information, and I don't think they are different from anybody else. Yet, I can't just say,"Wikipedia is not peer-reviewed so don't use it." I have to tell them exactly what it is and how to use it wisely. That's complicated. How to evaluate resources, search results, and other things. People had to worry about this before of course, but it's more serious now. On the web there are scams, pretend experts, out and out propaganda and there are many false steps for people. On the web, you don't even know where to turn for reliable help. I even have to tell my students to think about their own privacy, which in the days of Facebook seems to be disappearing or at least changing. I don't want to worry them, but I do want them at least to think about it for their own sakes. For all these reasons, I think there would be a major place for librarians in society if they take it.

I personally love this new world, but it is a different one. The quote of Bilbo Baggins is really apt today, when he said that it is a dangerous thing to walk out of your front door. You step out on the road, and if you don’t keep your feet, there’s no knowing where you might be swept off to. Anybody spending time on the web today understands this very well.

Jim Weinheimer