Re: The next generation of discovery tools (new LJ article)

From: Jonathan Rochkind <rochkind_at_nyob> Date: Mon, 28 Mar 2011 12:19:57 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

No. You just said the opposite of what I tried to say, Eric, while 
implying you were agreeing with me. Clearly I'm having trouble being clear.

What I'm saying is I would not _assume_ that the TF-IDF score 
distribution has the shape of a long tail, just-because/even-if user 
perception of relevancy assigned to the docs has a shape of a long tail.

A TF-IDF relevancy ranking that puts things in an order that often 
matches a user-assigned order, does NOT neccesarily also assign absolute 
values that match user-assigned value.

Eric, do you have particular experience with TF-IDF that says you often 
get a long tail in the actual scores, not just in the user perception of 
value?  Because some actual data would be welcome.

I have not looked at my numbers myself. It would take a buncha work to 
get some numbers on a graph, that I don't have need of/time for right now.

But on the Solr list, when people ask questions that have that as an 
assumption -- like "How can I exclude the 'poorest' scored results from 
my result list?" -- the answer from the Solr experts is generally that 
you can't just take some arbitrary score as a cut off, because the score 
has no objective meaning, and will vary from query to query and index to 
index.  As I quoted before:

"Scores for results for a given query are only useful in comparison to other results for that exact same query. Trying to compare scores across queries or trying to understand what the actual score means (i.e. 2.34345 for a specific document) may not be an effective exercise."http://lucidworks.lucidimagination.com/display/LWEUG/Understanding+and+Improving+Relevance

Now, that doesn't directly answer the question. Scores MIGHT have a 
"long tail" distribution when user perception of value has a long tail 
distribution.  But I wouldn't bet on it, and I certainly would not 
_assume_ it. So far this whole discussion seems, to me, to be just 
people assuming it, nobody has any data.  It is not a safe assumption.   
Relevancy scores succeed at putting things in the right _order_ to match 
user perception of value (much of the time, not for 100% of users and 
searchers of course).  That does NOT mean that the relationship between 
individual document scores matches user perception of value.

On 3/28/2011 11:07 AM, Eric Lease Morgan wrote:
> On Mar 28, 2011, at 10:58 AM, Jonathan Rochkind wrote:
>
>> It is true that the _user experience_ of TF-IDF type algorithm ranking
>> is often that you get a few highly relevant results, and then the
>> results trail off into around-equally-non-relevant...
>>
>> Even though your _evaluation_ of relevance might look like:  100, 98,
>> 87, 54, 35, 12, 4, 1, 1, 1, 1, 1, 1, 1, 1,
>>
>> The actual numbers might look like:
>>
>> 100, 70, 69, 68, 67, 66, 65, 64, 30, 39, 28, 27, 26, 10, 9, 8, 7
>
>
> Yes. If I understand the question correctly, then the TFIDF scores associated with any given search result can be described as having the shape of a "long tail", or, put another way, have a Zipfian distribution.
>