Re: Tim Berners-Lee on the Semantic Web

From: Eric Lease Morgan <emorgan_at_nyob> Date: Tue, 27 Oct 2009 22:02:32 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I have not been ignoring the discussions on our mailing list; please  
do not consider me to be the absentee list owner. I only feel the need  
to step in when things get ugly, when the discussion seems to be  
between only two individuals, or the discussion is really off topic.  
So far none of this has really been the case.

That being said, I would like to add my two cents:

   1. NGC4Lib - The purpose of the list is/was to discuss issues  
surrounding the idea of "'next generation' library catalogs". In  
retrospect, the word "catalog" was not a very good one, but it is/was  
the only word we had. The word "catalog" brings along with it so many  
different connotations. Its meaning is too ambiguous. Since then the  
phrase "discovery system" has been added to the vernacular to denote  
the indexing of library content and its subsequent searching. Good  
examples of this type of software is VUFind, AquaBrowser, Primo,  
Blacklight, etc. In reality, all of these applications are indexers  
(not databases), and most them are based on an open source tool called  
Lucene. Add Web 2.0 features to your Lucene index of library content  
and apparently you have a "'next generation' library catalog". Even  
now these ideas are being blurred with the introduction of other  
initiatives such as Summon, Primo Central, OpenPHI, and some of the  
work I've heard about Ebsco. Other things are in the mix too including  
Rochester's eXtensible Catalog (XC) and the DLF's Open Library  
Environment (OLE). The sum of all of these things is expected to be  
the topic of NGC4Lib.

   2. Linked data - A lot of the discussion of late has surrounded the  
concept of linked data. The idea is relatively simple. Make your data  
available via HTTP, and within your data link it to other linked data.  
Through such a process a "web" of content will make itself available  
-- a web that can be read by computers, find relationships between  
data that would be difficult for humans discover, and literally expand  
the proverbial sphere of knowledge. Simply dumping our bibliographic,  
authority, and holding information as MARC would accomplish this goal  
but very poorly. Transforming it into MARCXML would be a step in the  
right direction but not much. This is true for two reasons. First, the  
MARC format is not XML, and therefore not easily parsed/understood by  
the larger Internet community. It requires a knowledge of the "secret  
code book" -- a knowledge of what 1xx, 245, and 6xx mean in the  
contexts of a bibliographic or authority record. Second, and more  
importantly, its data is too string based. The value for a 100 field  
ought to be a key -- think unique identifier akin to a relational  
database key -- not a name or title. These keys are the URIs of the  
linked data world. I'm not sure what the best solution is, but I would  
consider dumping our MARC records from our catalogs, transforming them  
into some implementation of RDF, converting the values in the 1xx  
fields, 6xx fields, 7xx fields, and 254 fields to the URIs from places  
such as dbpedia.org or OCLC's authorities, and then let the Web do the  
rest.

   3. Collections and services - Suppose the linked data scenario  
comes to fruition. Suppose our local "discovery systems" work  
perfectly. What then? A person does a search. A computer suggests  
additional items of interest. The next question is, "Great, give it to  
me." Is that the sum of what we are about? With the increasing  
availability of content on the Web, how is providing access to  
information a niche librarianship can fill? It isn't. We don't create  
content, and there are too many competitors. Publishers will  
eventually provide direct access to their content. Open access  
publishers will have their content on the Web. Heck, even the books  
will be found in Google Books, the HaitiTrust, or the Internet  
Archive. Access to content will not be the problem to solve. Instead  
it will be on how to use, understand, and synthesize it within the  
context of the user. This is a role libraries are more than able to  
fill. We are expected to know our users. We know what classes they are  
taking, and what classes they are teaching. We know what their  
dissertation was on, and we know their major discipline of study. We  
know they are in the business of government, making widgets, or saving  
lives. Our users are looking for information to help them do their  
work. We can provide the tools enabling them to do this work quicker,  
easier, and more efficiently. It is not just about collections.  
Everybody will have collections. It is about collections and services.  
It is about putting the collections in context. Collections without  
services are useless, and services without collections are empty. A  
library needs to provide both, especially in a digital environment. I  
think the "'next generation' library catalog" is an embodiment of this  
idea.

-- 
Eric Lease Morgan
University of Notre Dame