Re: Cablegate from Wikileaks: a case study

From: Lovins, Daniel <daniel.lovins_at_nyob> Date: Tue, 7 Dec 2010 09:07:27 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Kyle, 

Point well taken. 'Objectivity' might not be the most helpful word to have used. And I agree that preprocessing and normalization make a difference as does scoring of fields at index and query time, but my broader point is that libraries don't have an interest in hiding these various data processing steps, and therefore allow the opportunity for outside criticism. This is simply not the case for proprietary ranking algorithms.

Daniel

> -----Original Message-----
> From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On
> Behalf Of Kyle Banerjee
> Sent: Monday, December 06, 2010 6:05 PM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: Cablegate from Wikileaks: a case study
> 
> > Library search algorithms, by contrast, especially when harnessed to open
> > source search engines like Lucene/Solr, can be verified by others for
> > accuracy and objectivity.
> >
> >
> Not really.
> 
> You need to preprocess/normalize data before indexing it which makes a huge
> difference in the results. Also, these indexing engines allow you to assign
> more or weight to different components upon ingest or even during the search
> itself. Altering even one weighting factor by a fraction of a point can make
> a huge difference in the results.
> 
> Setting these values is more art than science, and the actual numbers are
> irrelevant outside the context of the specific application at hand as they
> are ultimately based on what you know about the resources and the people
> using them.
> 
> This means that the values can be regarded as arbitrary, and when you don't
> get the desired results, you tweak the values until you achieve the desired
> behavior.
> Since things constantly change, the fine tuning process never ends. There's
> no practical way to make it objective.
> 
> kyle