On 05/08/07, Ted P Gemberling <tgemberl_at_uab.edu> wrote:
<snip>
> I work in a medical library, and a couple of days ago I cataloged a book
> on a physician in the 17th century, "Serjeant [sic] Surgeon John
> Knight."
</snip>
Hi Ted:
I'm going to jump on this because it's useful, for me, to try and
introduce a thread on a feature offered by current catalogues that
hopefully will be able to be better implemented by next generation
catalogs.
I took a look at the UAB Lister Hill catalogue because I was
interested in how you or your catalog handled the variant spelling of
"sergeant" used in the title.
I found that you chose not to include the variant spelling anywhere in
the MARC record, so there would be no direct hits for someone who
searched on "sergeant surgeon john knight" (using the most common
spelling form of "sergeant") if they were simply searching from
memory. This is consistent with what I found at a couple of other
libraries that had catalogued the same resource, so you're in good
company there!
When I tried searching for "sergeant surgeon john knight", I found
that your catalogue (Horizon?) was almost helpful:
1) it provides a phone number where people can get help;
2) it provides a linked search for "Did you mean?". Unfortunately, it
chose "sergent surgeon john knight" as the suggested search, which
also results in zero hits.
3) it provides a list of alternate possible search terms to use
instead of sergeant, of which "serjeant" is the second word listed.
Unfortunately, none of the terms is linked, so the user has to modify
the search terms manually if they want to search. And none of the
terms has any sort of rank-weighting explicitly shown, so there's no
guarantees that using any of the terms will actually result in a hit
in the catalog.
In my work with Evergreen, I've seen similar behaviour (suggested
search terms that result in no hits); it's great that it offers
spell-checking, but there's room for improvement in how it is surfaced
to the user. Maybe we can set requirements for the way that
spellchecking would ideally work for the next generation of
catalogues; something like:
1) Show suggested alternate terms only if using that term will
actually result in at least one hit in the catalog. Show estimated
hits that will result if that term is used in place of the incorrect
term.
1a) This gets complex if there are two or more suspected misspelled
words. Perhaps rank the suggested alternate phrases by the estimated
hits that will result from each combination?
2) Make each suggested term a clickable link. If the user searched for
a phrase, show the modified term highlighted in the context of the
entire phrase so it's clear that clicking the link will resubmit the
search for the entire phrase.
Beyond this, what about including thesaurus-like capabilities for
increasing recall as well? Most dictionaries recognize variant
spellings of words and point towards the most common form; so taking
advantage of this capability and giving the user the opportunity to
broaden their search (by ORing variant terms and synonyms under the
covers) would seem to make a lot of sense. Recognizing homonyms (pale
imitation vs. pail imitation) as a possible alternative search
direction would be another way to take advantage of this body of
knowledge.
Perhaps these steps towards improving the user experience for spell
checking and term-broadening are too basic to qualify for a
next-generation catalogue. But the current state of spell-checking in
today's catalogues is not impressive.
Worldcat does not seem to provide any spell-checking at all.
Koha (at least as implemented at Crawford County Federated Libraries)
might be going too far in its spell-checking implementation. A search
for "serjeant" returns 11 hits, none of which contain the term
"serjeant". "Ah", I thought, "Koha is using thesaurus expansion to
automatically search 'sergeant' as well!" But I thought wrong; a
search on "sergeant" returns 15 hits. More interesting, a search on
"sergant" returns 42 hits, most of them for "servant". A search on
"pail" returns hits for records that don't contain the word "pail" but
do contain "Paul" and "mail", so it seems as though some
single-character wild carding is automatically being used for the
default keyword search and no feedback is being given to the user
about what happened to their search terms. For my taste, this is
trading off precision too early in the search experience in return for
gobs of recall.
I think we can work towards a UI experience for spell-checking in
catalogues that is biased towards precision, but enables the user to
quickly expand recall via spell-checking and thesaurus capabilities in
a helpful (that is, offer no suggestions that lead to zero hits) and
progressively disclosed (that is, leave the user in control of the
search session) manner.
--
Dan Scott
Laurentian University
Received on Mon Aug 06 2007 - 08:53:11 EDT