Re: The next generation of discovery tools (new LJ article)

From: Eric Lease Morgan <emorgan_at_nyob> Date: Mon, 28 Mar 2011 11:07:08 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

On Mar 28, 2011, at 10:58 AM, Jonathan Rochkind wrote:

> It is true that the _user experience_ of TF-IDF type algorithm ranking 
> is often that you get a few highly relevant results, and then the 
> results trail off into around-equally-non-relevant...
> 
> Even though your _evaluation_ of relevance might look like:  100, 98, 
> 87, 54, 35, 12, 4, 1, 1, 1, 1, 1, 1, 1, 1,
> 
> The actual numbers might look like:
> 
> 100, 70, 69, 68, 67, 66, 65, 64, 30, 39, 28, 27, 26, 10, 9, 8, 7

Yes. If I understand the question correctly, then the TFIDF scores associated with any given search result can be described as having the shape of a "long tail", or, put another way, have a Zipfian distribution. 

-- 
Eric Morgan
University of Notre Dame