Re: Harvesting of data by Google

From: Stephens, Owen <o.stephens_at_nyob> Date: Wed, 18 Mar 2009 12:50:20 +0000 To: NGC4LIB_at_LISTSERV.ND.EDU

This presentation has just come across my twitter feed, and I think it is relevant both in terms of pondering why Dave is getting better ranking than GBooks or Amazon, and also in a general sense of how we can move our OPACs to being more 'of the web' and therefore more likely to appear in search engine results. It's from Michael Smethurst at the BBC - if we had OPACs that followed some of the rules laid out here, I'd be a whole lot happier :)

http://www.bbc.co.uk/blogs/bbcinternet/2009/03/designing_for_your_least_able.html

Owen

Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ

t: +44 (0)20 7594 8829
e: o.stephens_at_imperial.ac.uk

> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of David Pattern
> Sent: 18 March 2009 09:58
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] Harvesting of data by Google
> 
> I've been experimenting with sitemaps for a couple of months and have
> managed to get a sizeable chunk of our catalogue onto Google.  If you
> try a search for "rise of the labour party" we're appearing on the
> first page (above Google Books and Amazon UK).
> 
> We're cheating slightly by blocking Google from crawling the OPAC and
> instead presenting a simple text page of the bib details and links to
> other "people who borrowed this, also borrowed" items (you can probably
> see this page by clicking on the cache link).  Anyone/thing hitting the
> page who isn't a spider/bot should get automatically redirected through
> to the relevant page in the OPAC.
> 
> The big question is why has Google ranked our pages higher than the
> likes of Amazon?
> 
> regards
> Dave Pattern
> Library Systems Manager
> 
> ________________________________________
> From: Next generation catalogs for libraries [NGC4LIB_at_LISTSERV.ND.EDU]
> On Behalf Of Stephens, Owen [o.stephens_at_IMPERIAL.AC.UK]
> Sent: 18 March 2009 09:38
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] Harvesting of data by Google
> 
> I've experienced the vagaries of Google even on a straightforward
> website - I was responsible for a UK Uni web presence and one day our
> entire site was dropped from the Google index for reasons we never
> discovered. After a lot of investigation, and many attempts to raise
> the issue with Google (which never got us anywhere), the site
> reappeared as suddenly as it had disappeared.
> 
> I don't think we should necessarily rely solely on Google. However,
> publishing on the web in a crawlable way is still fundamental to being
> found by any search engine, and by users. I don't think this stops the
> use of vertical search approaches at all - Amazon provides its own
> search as well as regularly appearing in search results from Google -
> but we need to ensure these approaches go together, not one without the
> other - and currently most library catalogues provide vertical search
> but have no chance of appearing in any broader web searches - from
> Google, or anyone else.
> 
> Owen
> 
> Owen Stephens
> Assistant Director: eStrategy and Information Resources
> Central Library
> Imperial College London
> South Kensington Campus
> London
> SW7 2AZ
> 
> t: +44 (0)20 7594 8829
> e: o.stephens_at_imperial.ac.uk
>