Re: How Google makes improvements to its search algorithm

From: Meloni, Julie (jcm7sb) <jcm7sb_at_nyob> Date: Wed, 31 Aug 2011 11:49:22 +0000 To: NGC4LIB_at_LISTSERV.ND.EDU

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of James Weinheimer
Sent: Monday, August 29, 2011 8:57 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] How Google makes improvements to its search algorithm

On 29/08/2011 14:31, Meloni, Julie (jcm7sb) wrote:
<snip>
Training is done in a custom system using already-rated search results; 
if you get X number correct, you can move on in the process. All ratings 
have several sets of eyes on them, and even more if the ratings differ 
(say between a 3 and a 5 on a 5 point scale). There is room (and a 
requirement) that you argue for your rating in that situation. In my 
experience, fellow raters were educated, tech-savvy individuals with the 
ability to make logical arguments; they look for people with broad 
knowledge since you have to be able to rate results for Lady Gaga, 
tsunamis, cricket results, and space exploration equally well (as an 
example).
</snip>

> That is very interesting! You mentioned "if you get X number correct". 
> In your opinion, was it pretty clear what was "correct" and what was 
> "incorrect"? 

Inasmuch as you can be "correct" when holistic scoring is involved, yes. :)  At least in the training test...there were some examples in the test that were completely debatable as to their score, in which case the test was whether or not you could articulate why you gave the score that you did.

> For instance, a query of "Mona Lisa" that retrieved a 
> resource on herding reindeer in Finland could be labelled incorrect 
> pretty safely. 

Yep.

> But determining what would be "correct" would seem to be 
> more difficult: the painting or the song, or perhaps some words from a 
> poem. For example, when evaluating the search "Mona Lisa" how would a 
> high ranking of a page about Nat King Cole be considered? Or is this not 
> the way it works?

That _is_ the way it works, and where "Mona Lisa" (the song) first appears now in the search results for just "Mona Lisa" (somewhere around the middle of the third page is a YouTube clip) is about right (with no additional search parameters).

HYPOTHETICALLY (ahem), the way it could work is that a rater might see:

Search Term: "mona lisa"
Search Result URL: http://en.wikipedia.org/wiki/Mona_Lisa

Then be asked to give it a rank from 1-5 where 5 is "yes, this should be the top result for this search term" and 4 is "should be pretty high, might not be the best, but there might not be a best one" and 3 is "sure, that should be in the mix" and 2 is "getting close to irrelevant" and 1 is "completely incorrect or spam/malware/bad bad place".  You would have to take into consideration what the average user would be looking for when entering those terms in a search engine.  Would the average expect something about "Mona Lisa" (the song) at the top of the list for just "Mona Lisa", or would a result about "Mona Lisa" (the song) only appear high in the list if someone searched for "Mona Lisa song"? 

For Search Term "mona lisa"
Search Result URL: http://en.wikipedia.org/wiki/Mona_Lisa_(Nat_King_Cole_song)

One might consider that only a 3

But for Search Term "mona lisa song"
Search Result URL: http://en.wikipedia.org/wiki/Mona_Lisa_(Nat_King_Cole_song)

One might consider that a 4 or more.

I also find it all very fascinating, and loved being able to use it as an example when teaching Rhetoric of Information of just how much _people_ are behind the machine -- students "got" that, to some extent.

- Julie

Julie Meloni
Lead Technologist/Architect, Online Library Environment
University of Virginia Library
jcmeloni_at_virginia.edu // 434-243-1974