On 12/22/07, Nancy Cochran <nancy.cochran_at_earthlink.net> wrote:
> Considering Google - not sausage - what are the lines of code that
> normalize the feminine and the masculine; the singular and the plural; the
>
> past, present and future tense; capital letters and lower case characters;
> silly punctuation like an apostrophe to indicate ownership; parts of
> speech as simple as nouns and verbs? And then of course, how does Google
> get the differing language versions of the same universally used word?
>
> It is important to note that they often do. And they do it in the
> background.
A lot of that emerges out of the fact that they have so much data. How does
Google give you the right results even if you misspell a word? Quite
frequently it is due to the fact that somebody else made the same
misspelling as you in some web page. Google does almost nothing with
natural language processing. [1]
<http://www.techcrunch.com/2007/12/18/googles-norvig-is-down-on-natural-language-search/>
**
I don't know how to find appropriate search terms, but I submit that this
> is the challenge for people who care about good search.
I see the challenge as being how you can give people what they want without
forcing them to change their searching strategy a bit. This idea that we
have to or should teach library patrons our controlled vocabularies and the
idiosyncrasies of our systems and practices is a very damaging one, in my
opinion. You shouldn't have to be inducted into the library information
priesthood just to use the catalog. People don't want the catalog to help
them find the right words, they want it to help them find the right books.
A google-like catalog would have to be powered by an insane amount of
unstructured "wiggly" data supplied by regular users, with a lot of the
power and flexibility coming from emergence [2]. Doing more with the data
we've got is well and good, but we need more data more than we need smarter
software.
With apologies to William S. Burroughs, library software is the ultimate
merchandise. The library software merchant does not sell his product to the
library, he sells the library to the product. He does not improve and
simplify his merchandise, he degrades and simplifies the patron.
--Casey
[1] http://www.technologyreview.com/Infotech/19868/page2/
[2] http://www.librarything.com/tag/emergence
Received on Sun Dec 23 2007 - 19:21:36 EST