Re: Relevance ranking: was Aqua Brow

From: Kent Fitch <kent.fitch_at_nyob>
Date: Sun, 6 Jan 2008 17:36:49 +1100
To: NGC4LIB_at_listserv.nd.edu
On Jan 6, 2008 9:13 AM, Michael Fitzgerald <mike_at_jazzdiscography.com> wrote:
...
>
> A Google search for "cat" finds Caterpiller Inc., the UNIX cat
> command, the CAT fast ferry service, a character animation plug-in,
> the .cat (for Catalonia) Internet domain, the Citizens Area Transit
> .... [many other "cat but not feline" examples cited ]
> and that's just in the first 50 hits (my first page setting). Yes,
> there were things that did seem to be feline-related too.
>
> The "Searches related to cat" includes "cat's cradle, cat iim, common
> admission test, California achievment test" as well as things having
> to do with felines.
>
> It would seem that Jim's point is quite on-target.

Lets compare apples with apples:

http://catalog.princeton.edu/cgi-bin/Pwebrecon.cgi?Search_Arg=cat&Search_Code=SUBJ_&CNT=50&HIST=1

Extract of the results of this library "concept" search on "cat",
first number below is the hyperlinked count of titles (do the search
yourself!)

        1       Cat.    (if you click the "more info 1" link, you'll see this:
                            "See:Civil Air Transport." Yet
inexpicably, if you click on the one and only title at
                           Princeton on the subject of "cat", you
won't see anything about air transport
                           but "[Collection of pamphlets on biology]"
published in 1890, with lots about felines.
                           What is happening here?]
        0       CAT (Computer adaptive testing)
        0       Cat, Domestic
        0       Cat family (Mammals)
        0       Cat, Fat
        0       Cat food
        1       Cat Harbour, Newfoundland.
        1       Cat Island (Bahamas)--Social conditions.
        2       Cat Island (Miss.)
        2       Cat Island National Wildlife Refuge (La.)
        1       Cat lake.
        0       Cat-Mackiewicz, Stanis?aw, 1896-1966
        1       Cat owners--Anecdotes.
        4       Cat owners--Fiction.
        1       Cat owners--Manitoba--Winnipeg--Drama.
        1       Cat owners--Massachusetts--Anecdotes.
        1       Cat owners Mississippi Biography.
        1       Cat owners--Psychology.
        3       Cat owners United States Biography.
        0       Cat painting
        1       Cat people (Motion picture)
        0       CAT (Personality test)
        1       [cat:sg].
        1       Cat Spring (Tex.)
        1       Cat Stane (Lothian)
        0       CAT systems (Stenography)
        0       Cata?, Alfonso Herna?ndez, 1885-1940
        0       Cata (Micronesia)
        0       Catabaptists
        0       Catabathmus (Egypt)
        0       Catabolism
        1       Catabria (Spain)--Guidebooks.
        1       Catacaos (Peru)--History.
        1       Catacaos (Peru)--Social life and customs.
        0       Cataclysmic binary stars
...

So, lots of non-feline cats here, and not many titles on the few
feline cat topics anyway.

What you have to know, dear searcher, is to click the "More Info 3"
link/box under the "#" heading, to be taken to a page which includes
the text:

Reference Info
See: Cats

"Cats" is hyperlinked and takes you to a much better place to start
finding information!

I think most people would agree that the library concept search could
be "easily" made a lot better if the concept search result actually
directly said under "cat domestic": See instead "Cats".  So why
doesn't it?  Why doesn't this library care enough to make this happen?
 (In)actions speak louder than words.

Many do care:
http://www.vufind.org/demo/Search/Home?lookfor=cat&type=all&submit=Find
http://catalogue.statelibrary.tas.gov.au/find/?q=cat
http://www2.lib.ncsu.edu/catalog/?Nty=1&N=0&Ntt=cat&Ntk=Keyword
http://books.google.com.au/books?q=cat&btnG=Search+Books
http://ll01.nla.gov.au/search.jsp?searchTerm=cat

I know lots about the last example - it is based on keyword searching
augmented with LCSH.  LCSH isn't completely useless - it can be very
useful for people well trained in its use (what percentage of the
population is that?  what is the cost of training the rest?) or for
algorithms able to mine its value.  Google get this LCSH information
in the data they harvest from many libraries.  Google are good at
algorithms.

Compare Google's effort with "cat"; here's Google's experimental
concept grouping search results:

http://www.google.com/search?hl=en&esrch=RefinementBarLhsGradientPreview&q=cat&btnG=Search

I agree that it is fairly silly to discuss what happens with a search
like "cat", I just wanted to complete the comparison.

Lets try something more concept based, say "cooking fish"

http://catalog.princeton.edu/cgi-bin/Pwebrecon.cgi?Search_Arg=cooking+fish&Search_Code=SUBJ_&CNT=50&HIST=1
-v-
http://www.google.com/search?hl=en&esrch=RefinementBarLhsGradientPreview&q=cooking+fish&btnG=Search

A reasonable conclusion is that concept searches using LCSH are very
fragile.  Unless you know the right term, or get lucky, or are have
lots of time, the searcher is frequently frustrated and misled.
Keyword searches and full-text searches are more of a scatter-gun:
lots more recall, much less precision; the rely on augmenting metadata
(such as linking/citation data and counts, holdings and circulation
data, user ratings and tagging, and user history/profiling), source
analysis (where did the keyword occur?) and statistical analysis to
produce relevance ranking and result clustering.  These techniques
demonstrably work; people seeking information have voted with their
feet.

Regards,

Kent Fitch
Received on Sun Jan 06 2008 - 02:41:32 EST