Re: Relevancy-ranking LCSH?

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Thu, 8 Feb 2007 10:39:21 -0500
To: NGC4LIB_at_listserv.nd.edu
I hink there are different uses a controlled vocabulary can be put to.
LCSH was, I believe, essentialy designed for what I call 'single class
retrieval'. That is: Identify the class that best matches the subject
(or other category) you are looking for, find all the records posted to
that class.

Browsing is really another type of use, a controlled vocabulary not
designed to support browsing may not support it very well. I've looked
in the past for any research or writing on the functions of controlled
vocabularies, and more to the point, what features or aspects of a
controlled vocabulary serve what functions---but haven't been able to
find much. It may be that there are trade-offs, that a controlled
vocabulary optimized for one function will be un-optimized for another.

It's about technology only because computer technology makes it much
more possible to provide a good interface for browsing a controlled
vocabulary. When LCSH was designed, it wouldn't neccesarily have occured
to anyone that browsing was even a function that could or should be
supported by a controlled vocabulary. (Although that's not entirely
true--people even in Dewey and Cutter's time did address this to some
extent---it's historically been treated as a question of 'hiearchical
classification vs. alphabetic subject language', with it acknowledged
that hierchical classification is better for browsing than alphabetic
subject language (LCSH being an alphabetic subject language). I think
this discourse needs to be updated, in response not only to computerized
information retrieval, but to another 50 years of experience since this
sort of theory was last in vogue).

Jonathan

K.G. Schneider wrote:
>> But a fine-grained classification might not be exactly what we need,
>> for the purpose of ranking. The 3-digit Deweys might be just right
>> for ranking or grouping of results. Even these are outdated, yet this
>> level is more robust than the longer numbers.
>> The 3-digit Deweys might even be uncoupled from Dewey as such and
>> form the basis for an updated, new, very broad classification, just for
>> ranking and grouping. Most legacy data do have Deweys, so this would
>> be an obvious starting point. And where there is no Dewey, there's
>> probably an LCC, and it might be translated into an appropriate number.
>> Tables for that task exist.
>>
>> B.Eversberg
>>
>
> One of the things I noticed in managing a web portal for five years that for
> most of that time had LCSH browsing was that for discovery in the web
> environment, LC subject headings are inevitably too broad or too narrow.
> Some of this had to do with the small size of the database, but some of
> it-maybe most of it-had to do with LCSH simply not fitting well for
> collection-level browsing. Often the same items that yielded dead ends (or
> onesies, as I called them) in our database did pretty much the same
> elsewhere, as was true for the overbroad SH's. All this, and it was
> expensive, as well, and our own internally-created, ad hoc thesaurus did
> much better in retrieval evaluations and in usability testing.
>
> This is not to bash LCSH, but to observe about it what can be observed about
> most of our approach to classification (q.v. Diane and Karen's excellent
> piece about RDA): it's designed for 20th-century media. Shoehorning it onto
> the web just doesn't work. I'm not sure Dewey would, either, but at least
> Dewey, as a shelf inventory system, is designed to group like items for
> browsing. All LCSH really does is describe the item in hand. That may be
> important, but it doesn't mean that LCSH is thus suited for anything else.
>
> I had often hoped for a day when I could do an automated LCSH-Dewey
> crosswalk precisely for the reasons stated above-because I suspected that
> Dewey would do a better job (though necessarily labeled far differently than
> its numeric scheme). *Suspected,* anyway.
>
> K.G. Schneider
> kgs_at_bluehighways.com
>
>

--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
Received on Thu Feb 08 2007 - 09:37:05 EST