Re: Harvesting of data by Google

From: Tim Spalding <tim_at_nyob> Date: Wed, 18 Mar 2009 11:38:29 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Before I did LibraryThing I worked in SEO, and I dispensed advice much
like that presentation. But the problems I deal with every day are
rather different, owing to the extreme size of the site. Libraries
have the same problems, and for the same reasons, mostly. So I'd be
interested to hear more of what people think about the special
problems of library SEO.

In brief, library SEO is funny because:

*Most SEO thinking, writing supposes small brochure sites
*Most SEO thinking, writing focuses on winning in narrow slices—"I
want my Cancun Vacation page to show up higher than so-and-so's!"
Libraries don't care so much about that; they just want everything "in
there."

So, in a weird way libraries with browseable catalogs are thinking
about problems normally only considered by the tippy-top of the SEO
world—the people who optimize for Amazon and Abebooks, for example.
And it focuses on a problem those people aren't as excited by—coverage
over ranking.

Some other weird things about library SEO:

*Library links don't always mean what Google expects. They aren't
"votes" and they aren't relevancy-ranked. For example, every book with
a "Love Stories" LCSH will link to a list of books with that LCSH, but
in most catalogs the list will be arbitrarily ordered. To the PR
algorithm, however, page one is more important than page two.
*Libraries duplicate content pell-mell. To take the HiP catalog I
reviewed recently again, each book has ?five subpages with bits of
data on it. This runs the risk of triggering the "Googlebot found an
extremely high number of URLs on your site" error (delivered to your
Google Webmaster Central page). I know this problem well. LibraryThing
triggers it every day. If our sitemap showed every page we'd be way
over the 50,000,000 limit.

Best,
Tim