Re: A fundamental question we seem to be dancing around

From: Eric Lease Morgan <emorgan_at_nyob> Date: Fri, 13 Feb 2009 10:33:24 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

On 2/13/09 10:05 AM, "B.G. Sloan" <bgsloan2_at_yahoo.com> wrote:

> Allowing for searching at the article/chapter level, or even smaller levels,
> assumes that this granular level of data exists in the catalog record. In many
> (most?) cases it does not.
> 
> Does this mean we'd have to do a massive data conversion project?

IMHO, the short answer is, "No."

Technically speaking, a library catalog is type of index. Look up word,
phrase, author, title, etc. Get back a pointer. In the case of a library
catalog, this pointer is (almost always) a call number. In the case of a
back-of-the-book index the pointer is page number. In the case of a
bibliographic index, the pointer is a citation. In the case of a Internet
index, the pointer is a URL.

No, there is no need to have a conversion project. Instead, all you (we)
need to do is add things to our index.

Indexes are NOT databases. Indexes, for the most part, are used for find. On
the other hand, databases, for the most part, are used to manage and
report.* Used together indexes and databases can be exploited to create
integrated library systems. Examples of indexers include: Zebra, swish-e,
Lucene, KinoSearch, etc. Examples of databases include Oracle, MySQL, and
Postgres.

For example, it is entirely possible to export our MARC records and feed
them to an index. In fact, this is *exactly* what all the "next generation"
library catalog things do, and many of them use Lucene as the underlying
indexer.

When it comes to other content the same process can be applied. Export
content and feed to to the index. Journal article citations are a good
example. This data can come from places like the DOAJ and their OAI feed, or
from vended indexes. Yesterday I learned about a service called ticTOC that
outputs RSS feeds of tables of contents. Read RSS feeds. Get article-level
metadata. Feed it to your index. Metadata (and/or full text) describing
ebook content is another example. Get the content from the Open Content
Alliance. Institutional repository content is another example. In these
cases the art of librarianship centers around answering the question, "What
content do I want to include in my index?"

No. There is no need to a conversion project because this additional content
is not necessarily expected to be in the database of local holdings, instead
it is in the index.

* The difference between "find" and "manage" seem to exemplify the
fundamental differences between the information retrieval community and the
library community. In general, librarianship puts emphasis on organizing
information -- think classification, where as the information retrieval
community puts emphasis on find -- think keyword search.

-- 
Eric "Glad This Discussion Is Taking Place" Morgan
Hesburgh Libraries, University of Notre Dame