Re: The Dirty Little Secrets of Search

From: Karen Coyle <lists_at_nyob> Date: Sun, 13 Feb 2011 09:02:49 -0800 To: NGC4LIB_at_LISTSERV.ND.EDU

Some bib databases have made themselves available to search engines,  
including Open Library, LibraryThing and GoodReads. (I think WorldCat  
does this as well, but I'm not sure.) It often takes a few pages into  
a Google retrieved set to reach one of them, I'm sure in part because  
Google's ranking is based on links -- that is, based on who links to  
the pages. There is undoubtedly some linking to them, but if you  
multiply those entries by every library catalog... well, the odds of a  
library getting "noticed" in the Google sense is pretty low because  
the links would be to a gazillion scattered databases, even though the  
thing linked to would be the same. Ideally one should be able to use a  
general work identifier that links to library holdings. -- Wait!  
That's WorldCat! And the rules are:

Data.  OCLC grants you a nonexclusive, nontransferable license to use  
Data solely for Non-Commercial Use.  The following activities are  
prohibited and you agree not to engage in (or permit) such activities:

- use of Data for Commercial Use, in any manner not expressly  
authorized by these Terms or in any unlawful manner;
- use of Data for cataloging;
- making more than one (1) copy per screen display;
- use of bots, spiders, or other automated information-gathering  
devices or programming routines to "mine" or harvest material amounts  
of Data;
- distribution, display or disclosure of Data except to the extent  
reasonably incidental to Non-Commercial Use; and
- permanent or long-term storage of Data (including, but not limited  
to, creation of or repackaging in a database containing material  
amounts of Data).

Whew! That seems to rule out a lot of possibilities. The Open  
Bibliography folks have been talking about reasonable licensing for  
bibliographic data [1] and one of the discussions is that the  
non-commercial prohibition basically stops all use of data because you  
cannot guarantee that as data travels around cyberspace that it won't  
get mashed up with something tainted with commerciality (which perhaps  
even means Google and other search engines).

kc
[1] http://openbiblio.net/principles/

Quoting Thomas Krichel <krichel_at_OPENLIB.ORG>:

>   Weinheimer Jim writes
>
>> This is a rat race that libraries should do their best to avoid.
>
>   I am sure they will heed to your advice. When have libraries ever
>   taken part in a rat race? ;-) And in the slow march towards engine
>   visibility, they have not even started to get moving. If a search
>   engine can find that I live in Jackson Heights, NY, should it not
>   point my query to "Moby Dick" say, to the copy of it in a close-by
>   public library?  It won't be able to. Library catalogs are not
>   visible to search engine crawlers unless libraries prepare a
>   complete browsable index to their holdings on the public web. I am
>   sure it's not difficult to set up such pages. It would probably take
>   me a couple of hours to do it for Koha, the system I am familar
>   with, to set up an ugly and primitive one. Please correct me if I am
>   wrong, but it does not apppear to a standard feature of ILS
>   software. It ought to be.
>
>
>   Cheers,
>
>   Thomas Krichel                    http://openlib.org/home/krichel
>                                 http://authorclaim.org/profile/pkr1
>                                                skype: thomaskrichel
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet