Re: How Google makes improvements to its search algorithm

From: James Weinheimer <weinheimer.jim.l_at_nyob> Date: Wed, 31 Aug 2011 16:30:46 +0200 To: NGC4LIB_at_LISTSERV.ND.EDU

On 31/08/2011 15:17, Jimmy Ghaphery wrote:
<snip>
> I am fascinated by the notion of imprecise or custom search results 
> and the way in which it challenges our expectations in the libraries.
>
> An important aspect to the appropriateness of fuzzy results is the 
> characteristics of the underlying data. In the case of Google we are 
> talking about a huge data set that can at best be loosely corralled. 
> In this context, using additional data such as usage patterns and 
> geographic location of the searcher makes perfect sense to me. For a 
> scientist searching a genomic database, it makes sense that results 
> need to predictable and repeatable.
>
> It is not crystal clear to me where library data might fit along this 
> continuum. Considering the potential scope of the next generation 
> catalog I do think we need to embrace notions of rich algorithms and 
> rapid iteration to tease out relevant results. In reality our results 
> change every day that we add records (sometimes radically if we are 
> bulk loading). How scientific do we need to be here? Do we entertain 
> requests for a researcher who wants to see results from our previous 
> system or the results we presented from a search even a year ago?
</snip>

I remember this news report from the BBC where, because of the various 
tweaks, Google keeps losing a city in Florida and the consequences to 
the people living in that town! 
http://news.bbc.co.uk/2/hi/programmes/world_news_america/9038870.stm. 
(When I read a story like this, I often "teleport" back in time 25 years 
mentally and try to imagine what I would think. I would find this one 
completely incomprehensible!) I sent a post to Autocat 
http://catalogingmatters.blogspot.com/2010/09/disappearing-cities.html 
where I discussed my own views, and there was a short dialog.

One suggestion for fitting in library data was made by Eric Hellman in a 
talk at ALA, that I mentioned in another post to Autocat, which provoked 
more dialog. 
http://comments.gmane.org/gmane.education.libraries.autocat/40227. To 
make sure that I was not misinterpreting him, I wrote him and he got 
involved too, in another thread 
http://comments.gmane.org/gmane.education.libraries.autocat/40267. 
Essentially, he was saying that in the future, people would very rarely 
interact with library metadata as they do now (i.e. looking at catalog 
records), and that it would be used more as "microdata" 
http://en.wikipedia.org/wiki/Microdata_%28HTML5%29 behind the scenes, 
resorting and reworking search results, or Search Engine Optimization. I 
mentioned the Google Books project with all of its metadata, that most 
people probably don't even know about, but there has to be a lot going 
on behind the scenes there.

There is a very definite role for library metadata in the future. I 
personally think it has to do with ensuring a level of standardization 
to guarantee that Google's misplacing of towns doesn't happen because of 
the inevitable tweaks. Also, it becomes clearer and clearer to me that 
people really don't like to interact with the library's catalog--how it 
works, how it looks, even what it is, the catalog is becoming a strange 
thing for the average person. I think Hellman is onto something and may 
be on the right track toward a solution.

Seen in this sense, the Google raters example may prove invaluable.

-- 
James Weinheimer  weinheimer.jim.l_at_gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/