Re: crawling the catalog

From: Jonathan Rochkind <rochkind_at_nyob> Date: Wed, 9 Apr 2008 13:21:56 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Some more recent discussion of a similar idea can be found in the recent
Code4Lib Journal:

Googlizing a Digital Library
Jody DeRidder
http://journal.code4lib.org/articles/43

It's still not obvious to me if an XML surrogate gets you anything,
unless you just can't make your actual HTML crawlable. I think making
the actual HTML crawlable is highly preferable.

Jonathan

Steven Harris wrote:
> Someone at our library asked recently about getting Google to crawl our catalog.  The primary motivation was to reveal some unique items in Special Collections to a wider audience.  I find this description of an experiment several years ago:
>
>    http://www.theshiftedlibrarian.com/2003/02/03.html#a3569
>
> Basically, it requires the creation of an XML surrogate of the catalog.  What's the status of this idea?  Possible?  Desirable?  Hopelessly labor-intensive?  Stupid? Superceded by other approaches?  The materials are already in OCLC, so I don't know what a Google crawl of our data would add.  Just a chin-scratching morning here today.
>
>
>
> Steven R. Harris
> Collection Development Librarian
> Utah State University
> (435) 797-3861
> http://collections2point0.wordpress.com/
>
>

--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu