Re: Relevance ranking: was Aqua Brow

From: Rob Styles <rob.styles_at_nyob>
Date: Fri, 4 Jan 2008 10:04:33 +0000
To: NGC4LIB_at_listserv.nd.edu
On 4 Jan 2008, at 08:27, Weinheimer Jim wrote:
> What I'm getting at is a primary difference in the way Google and
> library catalogs search: in Google (et al.) you necessarily search
> "text," perhaps with fuzzy searching and other ingenious methods,
> but in the library catalog, you can search "concepts." This is how
> the catalog was designed. You can search the entire concept of
> Dostoyevsky, or Tolstoy, or WWI, or anything according to Cutter's
> rules from so long ago: to find what the library has by their
> authors, titles, and subjects. This *cannot be done* in Google
> because you are searching only text.

I'm sorry Jim, but you are quite wrong in this assertion that Google
searches only text and that library catalogs represent concepts. The
broad PageRank technology is discussed in detail on wikipedia (http://
en.wikipedia.org/wiki/PageRank). You yourself cite Google Bombs,
which by their very nature show how google is searching exactly the
concepts you suggest. (http://en.wikipedia.org/wiki/Google_bomb for
those wanting more on google bombs)

PageRank weights the phrases used to link to a page very highly,
that's how it works with text that does not appear in the result.
This is no different to acting on authority data - it's all metadata
outside of the result itself. Another way of thinking about that is
to look at your authority records - they're just text too.

> So, after I explain all of this, I ask them again: why are they
> happy with their Google search? The answer is: they thought they
> had done a concept search for Dostoyevsky, when they have only
> searched the text. They also thought that they received the most
> "relevant" items, when in actuality they are looking at the items
> that have the most links to them, or the most cited items. This
> does not mean that the items they are looking at are the most
> "relevant" items, at least not in the normal meaning of the term.
> At the end of the exercise, they are much more skeptical of Google
> results.

I can't help thinking that perhaps you are asking the wrong question
here. You ask google about 'dostoyevsky'. Without any additional
information they infer that you are asking about the russian author
and present you with a page full of results about him - primarily
summaries about him, his writing and the period in history as well as
a lot of detail on where to find more information.

What question was it that you were trying to answer about Dostoyevsky
when starting the search? When he was born? What he wrote? What
question does it fail to answer in the first page of results? Knowing
that would really help in knowing how to build a better search tool.

rob

Rob Styles
Programme Manager, Data Services, Talis
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
direct: +44 (0)870 400 5004
mobile: +44 (0)7971 475 257
msn: mmmmmrob_at_yahoo.com
blog: http://www.dynamicorange.com/blog/
irc: irc.freenode.net/mmmmmrob,isnick
Received on Fri Jan 04 2008 - 04:57:46 EST