Re: Harvesting of data by Google

From: Casey Bisson <cbisson_at_nyob> Date: Wed, 18 Mar 2009 16:18:32 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

The single most import problem in making library resources findable in  
Google or other search engines is putting them online with stable URLs  
in the first place. That's rule number one of my "rules of the Google  
economy[1]":

Linking must be possible
Linking must be desirable
Linking must be measurable

The second problem we face is that even though citing books and other  
works is desirable, the library form of citation is very different  
from what search engines expect: links. So once our catalogs are  
reliably linkable, we need to encourage linking to the resources in it.

When professors claim that Google doesn't have results they value I  
point out that it's likely that they haven't linked to the sources  
they value, and if they have, they probably haven't linked to them in  
a way that Google can index them (the third problem).

To make linking easier, I've been adding code snippets to the record  
displays that allow people to embed a bookjacket and link to our  
catalog simply by copying and pasting[2]. The "embed this" code  
snippets on YouTube made sharing videos by posting them to blogs or  
bulletin boards easy and fun, why not try the same with library content?

Tim is right to point out the caveats as he did, but those are  
problems most of us should be excited to face...once we make linking  
possible at all.

[1]: http://maisonbisson.com/projects/rules-of-the-google-economy/

[2]: example: http://library.plymouth.edu/read/341391#bsuite_share_embed

--Casey Bisson

http://maisonbisson.com
http://about.scriblio.net

On Mar 18, 2009, at 11:38 AM, Tim Spalding wrote:

> *Library links don't always mean what Google expects. They aren't
> "votes" and they aren't relevancy-ranked. For example, every book with
> a "Love Stories" LCSH will link to a list of books with that LCSH, but
> in most catalogs the list will be arbitrarily ordered. To the PR
> algorithm, however, page one is more important than page two.
> *Libraries duplicate content pell-mell. To take the HiP catalog I
> reviewed recently again, each book has ?five subpages with bits of
> data on it. This runs the risk of triggering the "Googlebot found an
> extremely high number of URLs on your site" error (delivered to your
> Google Webmaster Central page). I know this problem well. LibraryThing
> triggers it every day. If our sitemap showed every page we'd be way
> over the 50,000,000 limit.