Re: Authority in an Age of Open Access (an analysis)

From: James Weinheimer <weinheimer.jim.l_at_nyob>
Date: Thu, 8 Nov 2012 22:12:13 +0100
To: NGC4LIB_at_LISTSERV.ND.EDU
On 08/11/2012 12:45, Dave Caroline wrote:
<snip>
> I shall pick one small statement and show that "standards" do not do
> justice to the data or the users.
> If only the "professionals" looked out at the world beyond and asked
> themselves what do people want to find.
>
> Do the users really want a limited subset, no they are more likely to
> want the right subset.
> there is a difference. a standard chooses a subset with arbitrary
> rules that may have more leaning to the cost of cataloguing than the
> usability of the catalogue.
> A fulltext ocr that google can see will enable direct finding of
> information for anybody.
> An example from today, I have a trade book by a company called Barber
> Colman who made gear hobbing machines
> taking a description of one very special type of hob "mutilated-tooth
> single position hob" putting that into google took me straight to the
> patent, no messing with any catalogers view of the information and few
> would have the domain knowledge probably.
>
> One thing that library cataloging "standards" specify is only
> catalogue the first (main author/s). a travesty in a book where
> sections are all by leading authors.
</snip>

That is a very good example of the power of full-text searching, but I
don't see how this shows that standards do not do justice to the data or
the users. The search you did is only one type of search and I would
submit, is not what most people do most of the time. People are more
interested in topics. My favorite example is to search for "World War
I". People who are interested in that topic will enter into the search
box "world war i" or "wwi" or whatever, and in Google, the response will
be millions of hits on World War I, probably with the first link
straight into Wikipedia. More than anyone could read in a lifetime. When
I have asked people if they are happy with the search, they say "yes,
look at all the hits". When I then ask them if it is a good search, the
question surprises them and no one has figured out the problem until I
tell them.

I then go on to say that a search for "World War I" *cannot*--*by
definition*--find a very important type of resource. And that is:
anything from before 1938, i.e. before World War II took place. Nobody
called it World War I until there was World War II. Therefore, in a
full-text search for "world war i" it is impossible for there to be any
primary sources, nor can there be many secondary sources. The public is
not used to thinking in this way: full-text, as the term says, is
searching *text* while library catalogs were designed to allow people to
search for *concepts*. Although people may say they want resources on
"World War I" 99% are actually interested in "that big war that took
place mainly in Europe from the years 1914-1918 that killed millions, no
matter what words have been used to name it". The fact is, you can
search library catalogs for concepts--something you cannot do in Google.

In your example of "mutilated-tooth single position hob", you were
searching for the text, not trying to find out the range of information
that may be available on this hob. Is it known under any other terms?
Possibly, but that would take research. While people do occasionally
want to search text as you did, it seems that people mostly don't care
what the words are, they want to find out about architecture in Rome,
the techniques of Leonardo, and so on. These need conceptual searching.

Library catalogs were designed to allow this "conceptual access" to the
materials within a collection. In fact, it was physically impossible to
do a text search like everyone does today. There has been a complete
intellectual change and cataloging has yet to deal with it. Before, all
you could do was browse cards--you couldn't walk up to a card catalog
and say: "give me all the cards with the words 'world war i' printed on
them" and have them all fly into your hands. The most you could do would
be to search a card catalog by the beginning words of the title--that
is, if the library made title added entry cards--but that was
exceedingly hit or miss. On the other hand, you could search the cards
by their subjects and that demanded certain methods by both catalogers
and searchers, and yes: standards.

I admit that transferring this conceptual access into the computerized
catalog has not been done very well at all and has been, in my opinion,
one of the biggest disasters in the history of  cataloging. Still, I
think people would very much like the conceptual access since it is
available nowhere else.

Turning to other examples of searching text vs. concepts reveals that
words (text) change constantly. What words would you choose to search
full-text for "African Americans" in documents from the 19th century?
Even from the 1960s? Or homosexuals? What about different languages? I
can remember people from the South calling the U.S. Civil War as "The
War of the Northern Aggression". Here is an interesting Wikipedia
article on the topic:
http://en.wikipedia.org/wiki/Naming_the_American_Civil_War. So it is
clear that full-text searching, although it seems to be easy, actually
deals with the vast complexity of language change. It is mined with
boobytraps for the unsuspecting.

The *only way* the library type of conceptual access can work is through
adherence to standards by trained experts. While I think full-text is
great and use it all the time, we must question whether it is so good
that it eliminates the need for the other type of access. I certainly
don't think so. The first step though is to admit that our traditional
methods for allowing conceptual access has been a disaster in the
computer catalogs and figure out how to fix it for people today. That is
a huge task, I agree.

I am not saying that full-text is bad but just as anything else, it has
strengths and its weaknesses. It is my belief that merging the library
catalog with full-text would create something far more powerful than
anything we have seen so far. I think it would be really fun to try.

Unfortunately, I fear that people would rather dump this kind of unique
access found in library catalogs instead of fixing it. And of course,
libraries are in a budget bind having to pay for RDA implementation,
which will have no effect on any of this.
-- 
*James Weinheimer* weinheimer.jim.l_at_gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Received on Thu Nov 08 2012 - 16:13:21 EST