One other piece of reading here if you haven't seen it - "Google Book
Search: Document Understanding on a Massive Scale", by Luc Vincent (Head
of Google OCR related intiatives amongst other things) which is at
http://www.icdar2007.org/ICDAR2007_KeyNote_LVincent.pdf
It's his Keynote at the ICDAR conference, and isn't long or technical,
so worth a quick look, but the relevant bit to the Relevance ranking
debate is:
"Beyond document image processing, OCR, volume level understanding and
indexing, another important topic has kept the Google Book Search team
busy, namely ranking. Specifically, how should books that match a
particular query be ranked? The web is notorious for its rich graph of
hyperlinks, famously exploited by Google' PageRank algorithm [6]. This
structure applies somewhat to technical publications, which typically
contain numerous references to other technical publications. However the
universe of books is different and most books (eg, novels) do not
contain any references. Novel approaches therefore had to be developed,
exploiting an array of new signals. Additionally, these techniques were
recently extended to allow "blending" of book search results with web
search resuts when appropriate."
Unfortunately this doesn't shed light on what 'new signals' that Google
is looking to exploit.
Owen
Received on Fri Jan 04 2008 - 11:51:18 EST