On Mar 28, 2011, at 10:58 AM, Jonathan Rochkind wrote:
> It is true that the _user experience_ of TF-IDF type algorithm ranking
> is often that you get a few highly relevant results, and then the
> results trail off into around-equally-non-relevant...
>
> Even though your _evaluation_ of relevance might look like: 100, 98,
> 87, 54, 35, 12, 4, 1, 1, 1, 1, 1, 1, 1, 1,
>
> The actual numbers might look like:
>
> 100, 70, 69, 68, 67, 66, 65, 64, 30, 39, 28, 27, 26, 10, 9, 8, 7
Yes. If I understand the question correctly, then the TFIDF scores associated with any given search result can be described as having the shape of a "long tail", or, put another way, have a Zipfian distribution.
--
Eric Morgan
University of Notre Dame
Received on Mon Mar 28 2011 - 11:07:36 EDT