Re: Search/retrieve access is to library data what Gopher was to the web?

From: Jonathan Rochkind <rochkind_at_nyob> Date: Thu, 21 Aug 2008 14:53:44 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Karen Coyle wrote:
> Allowing a search engine to index the data shouldn't be a legal 
> problem, I wouldn't think. The problem is that the data isn't 
> available in a crawlable way -- it's stuck in databases that only 
> speak Z39.50. Where the legal problem comes in is that people feel 
> they can't put their data out on the open web, right?
I don't think allowing specific hand-picked search engines to crawl on a 
case-by-case basis will lead to the kind of innovation we need. If we 
could all just put the records out there, and say that anyone can crawl 
them, then different people would try different ways to crawl them (in 
the library world and elsewhere), and out of that would come some good 
ideas. That's how innovation happens, you know?  Even if I, inside the 
library world, want to experiment with this, I don't have the capacity 
to ask every single library everywhere for permission and special custom 
access to their records. But if they all could just put the records out 
for anyone to crawl, I could find time to experiment with it.

Thus my suggestion that our limitation here is legal/business.

Jonathan

>
> I was thinking that one of the barriers is that what we have is this 
> highly formalized metadata. The first problem with that is that we 
> have it in MARC, which no one other than libraries understands. The 
> second is that metadata is highly concentrated -- and web search tends 
> to be on full text and takes a shotgun approach rather than the 
> precise approach of library catalogs. Because the data is 
> concentrated, keyword searching is often unsatisfactory -- and the web 
> thrives on keyword searching.
>
> If we COULD surface all of the library metadata to the web, then I 
> think that we'd need to do something other than just treat each record 
> as a web page. I think we'd need to create a layer of merged data so 
> that each book (manifestation) is represented as few times as possible 
> (ideally once, but we know how hard that is), and we'd need a work 
> layer as well. And we'd need ways to navigate, not just search. 
> Linking books that cite each other (like following urls). I guess 
> that's the other problem with our metadata -- no interaction between 
> records, few links (I'm thinking of the 'related works' fields). So 
> much to do!
>
> kc
>
> Jonathan Rochkind wrote:
>> Sadly, I think much of the barrier is legal/business :  OCLC members 
>> are not allowed (or believe they are not allowed) to share their 
>> complete records with all and sundry.  Being able to share their 
>> complete corpus with all and sundry is what would set the ground for 
>> innovation. You never know who is going to provide this, but once you 
>> make it possible, somebody will.
>>
>> Jonathan
>>
>> Martin Malmsten wrote:
>>> Hi all,
>>>
>>> I am simply going to throw down the gauntlet and say that 
>>> search/retrieve access to library data is not good enough. For too 
>>> long have library data been trapped within data-silos only 
>>> accessible through obscure protocols. Why is access to library data 
>>> still an issue? This was solved in a matter of months on the web, 
>>> when Excite (or whichever search engine was first) was introduced. 
>>> Why are there not at least ten search engines containing the 
>>> majority of the worlds bibliographic data?
>>>
>>> Yes, I am stating/asking the obvious.
>>>
>>> So, Linked Data for libraries, anyone?
>>>
>>> Best regards,
>>>   Martin, who really wants a discussion about Linked Data
>>>
>>
>
>

-- 
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 
rochkind (at) jhu.edu