Re: The next generation of discovery tools (new LJ article)

From: Jonathan Rochkind <rochkind_at_nyob> Date: Wed, 30 Mar 2011 16:49:37 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

On 3/30/2011 11:33 AM, Till Kinstler wrote:
> 1/0) is dating back to the 1970s. And some conclusions from that made it
> even into libraryland as early as the 1980s (s. for example writings by
> Charles Hildreth, one article from 1987 even being titled "Beyond
> Boolean: ...").

Thanks for the psuedo-cite, I'll track it down and add it to my white 
paper trying to explain the point of relevancy ranking in a library 
context:  
http://bibwild.wordpress.com/2011/03/28/information-retrieval-and-relevance-ranking-for-librarians/

If you have any other such cites, feel free to share (I'm too lazy to do 
the research myself right now, or at any rate I don't think it's needed 
for the intended audience of the paper). Also interested in your opinion 
of my essay in general, Tim.

> Can we, instead if discussing the usefulness of relevance ranking over
> and over again (for, it seems, at least about 25 years, I think, all has
> been said), perhaps just start doing and improving it? I mean, we do in
> some way, driven by products from vendors,

Certainly some of us ARE doing that.

Although I have to admit, I don't think my time is particularly 
efficiently spent trying to improve on the relevancy ranking algorithm 
itself of lucene -- I'm not a mathematical programming type of guy, and 
even if I were I doubt I could improve upon lucene.

Instead, I spend my time (as many of us do) trying to configure the 
boosting parameters and such optimal for our use cases and databases.  
Naomi Dushay has some slides and a blog post about writing automated 
tests for Solr relevancy ranking setup, so when you're trying to tweak 
to rank better for new examples, you can know if you're ruining the good 
working of your existing examples.

Work is being done.

I think we could usefully spend more time, not trying to improve the 
relevancy ranking, but trying to improve the tools and UI we provide for 
increasing precision (rather than changing the ranking -- I think the 
ranking actually does okay as is) of searches that end up too 
low-precision/high-recall even with relevancy ranking.  The "facet 
limit" tools we all provide are one such technique, but I think we can 
make em work better and be more powerful without being more confusing. 
I've got some ideas I'll save for another time. (Haven't gotten to em 
yet because they will require a bit of Solr Java hacking).