Re: How Google makes improvements to its search algorithm

From: Jimmy Ghaphery <jghapher_at_nyob> Date: Wed, 31 Aug 2011 09:17:05 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I am fascinated by the notion of imprecise or custom search results and 
the way in which it challenges our expectations in the libraries.

An important aspect to the appropriateness of fuzzy results is the 
characteristics of the underlying data. In the case of Google we are 
talking about a huge data set that can at best be loosely corralled. In 
this context, using additional data such as usage patterns and 
geographic location of the searcher makes perfect sense to me. For a 
scientist searching a genomic database, it makes sense that results need 
to predictable and repeatable.

It is not crystal clear to me where library data might fit along this 
continuum. Considering the potential scope of the next generation 
catalog I do think we need to embrace notions of rich algorithms and 
rapid iteration to tease out relevant results. In reality our results 
change every day that we add records (sometimes radically if we are bulk 
loading). How scientific do we need to be here? Do we entertain requests 
for a researcher who wants to see results from our previous system or 
the results we presented from a search even a year ago?

On 8/31/2011 7:01 AM, James Weinheimer wrote:
> On 30/08/2011 20:04, Joseph Montibello wrote:
> <snip>
>> Jim W. wrote:
>>> Google does not allow any kind of "guaranteed" or "standardized"
>>> access--just the opposite. If the results vary for you and me, and
>>> even vary for ourselves depending on where we are searching from,
>>> plus it is tweaked almost twice a day, I think the public could possibly
>>> understand the argument for a more standardized means of access.
>> I think personalized is better, from the perspective of most patrons.
>> If you're doing research in medicine, you probably want to privilege
>> recent stuff over older stuff. However, this doesn't mean that the
>> metadata needs to be personalized. The underlying data needs to be
>> standardized, but that doesn't mean the presentation of the data
>> (including search result ranking) should be one-size-fits-all.
>>
>> Why does Google tweak their algorithm constantly? Lots of reasons, I'm
>> sure, and not all of them would be comforting to us. But I do think
>> that they've shown an ability to produce useful results. So I'd argue
>> against aiming at standardized access for all patrons. Returning
>> personalized results sends a message to the patron - "we're trying to
>> help you." In many cases, our standardized results tell the patrons
>> "We think we have the answers, and one of those answers is that
>> there's a whole skillset that you need to learn before you can do what
>> you thought you wanted to do."
> </snip>
>
> Yes, thanks for clarifying that for me. Patrons should be able to work
> as they wish with the search results, but the results themselves, at
> least a part of them, should be standardized in some way to permit
> guaranteed access, i.e. a search that worked yesterday should work today
> and tomorrow as well.
>
> <snip>
>> The best part of the video was its emphasis on big-time systematic
>> testing and evidence-based decision making. One guy mentioned that for
>> every time a certain feature didn't work, they wanted to be sure it
>> worked 50 times. I suspect there's no sound reason for that ratio,
>> it's just a practical line in the sand that they can shoot for.
>>
>> How can we get that kind of production testing?
> </snip>
>
> It takes a lot of resources and control over your own systems. A single,
> rich corporation like Google can do it, but for a diverse,
> loosely-organized group such as librarians, it would be much more
> difficult. Related to your previous comment, I think it's important to
> show our patrons that we are *trying* to improve matters *for them*, and
> that means there will be experiments, of which some might fail. Although
> failure is not such a great thing, I think the general populace
> understands that nothing is perfect and everything can be improved.
> That's how Google etc. work, and perhaps that is the lesson we should
> take: gradual, tiny improvements.
>

-- 
Jimmy Ghaphery
Head, Library Information Systems
VCU Libraries
http://www.library.vcu.edu
--