Re: crawling the catalog

From: Casey Bisson <cbisson_at_nyob> Date: Wed, 9 Apr 2008 17:29:53 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Steven,

My interest in making my catalog easily indexable by search engines
(and easily linkable by users) was a big part of my motivation in
developing Scriblio. One of the problems with letting Google or others
crawl many current catalogs is that they often have session
information embedded in their URLs or have other irrational URLs.

Still, a big part of this depends on having our users link to the
catalog, as that's the biggest single source of relevance information
in most search engines. Academic and book worlds are rich with
citations, but they're not in formats that are as meaningful on the
web as a simple link. I'm not sure if we'll be better of building
citation parsers or changing the behavior of those doing the citing,
but once we solve it, libraries will be very well represented online.

--Casey

On Apr 9, 2008, at 1:03 PM, Steven Harris wrote:

> Someone at our library asked recently about getting Google to crawl
> our catalog.  The primary motivation was to reveal some unique items
> in Special Collections to a wider audience.  I find this description
> of an experiment several years ago:
>
>   http://www.theshiftedlibrarian.com/2003/02/03.html#a3569
>
> Basically, it requires the creation of an XML surrogate of the
> catalog.  What's the status of this idea?  Possible?  Desirable?
> Hopelessly labor-intensive?  Stupid? Superceded by other
> approaches?  The materials are already in OCLC, so I don't know what
> a Google crawl of our data would add.  Just a chin-scratching
> morning here today.

Casey Bisson
__________________________________________

Information Architect
Plymouth State University
Plymouth, New Hampshire
http://Plymouth.edu/
http://about.Scriblio.net/
http://MaisonBisson.com/
ph: 603-535-2256