descriptive cataloging

From: Eric Lease Morgan <emorgan_at_nyob> Date: Tue, 3 Nov 2009 11:15:05 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

To what degree can our descriptive cataloging practices be enhanced  
considering the increasing availability of full text content?

Take, for example, the size of a book. Currently we describe its  
physical dimensions and number of pages, but does that really tell me  
how long the book is? With the availability of full text a "catalog"  
can denote the number of words in a document -- which is a much better  
indication of length. This book, The Prince by Machiavelli is 31,234  
words long (short). Whereas this other book, War And Peace by Tolstoy  
is 565,454 words long (very long). Moreover, War And Peace is 18 times  
longer than The Prince.

There are other possibilities for descriptive cataloging. How about  
intended grade level or "readability"? There exist well-established  
tests used to score the grade level and/or readability of a document.  
[1] Our word processing programs use these tests to grammar check your  
writing. The United States government uses these tests to ensure its  
population can read its documentation. Given full text it is possible  
to score individual library items and/or whole library collections.  
Want to read a book for intended for a 5th grader? You are a college  
student, and therefore these books may be apropos.

Why not classify the author to a greater degree? I only want a list of  
items written by women.

Examine parts of speech. Use a parts of speech application to count  
things like the number of nouns, pronouns, various types of verbs,  
adjectives, etc., in a document and then say, "This document contains  
a larger than normal number of active verbs; this book as an action  
book."

"I want to read a very short, action-packed book, written by a man who  
lived in the 19th century but whose story takes place in the Middle  
Ages, and I'm only in the 7th grade so please suggest something  
accordingly." Given adequate descriptive cataloging details such a  
query would be trivial. Given the full text of a book, creating these  
details would be trivial as well. We have full text. Hmm...

Once all of these things are represented in our "catalog", not only  
can we apply them to searches, but we can also apply analytics  
services against the search results. "Your search for author=plato  
returned the following 29 items. Click here to display: 1) a timeline  
of when they were written, 2) a histogram illustrating each document's  
length, 3) a spider chart connoting the degree each discusses core  
philosophic concepts". Not only is the content of our metadata records  
too string-based as opposed to key-based, it is also weak in terms of  
quantitative data. Thus mathematical analysis is difficult, if not  
impossible. Remember, computers are great when it comes to math.

Much of our time is spent figuring out new ways to automate old  
processes, when we should also be spending some of our time learning  
how to use our tools to provide totally new and different services. A  
"'next generation' library catalog" will be an embodiment of such  
things.

[1] http://en.wikipedia.org/wiki/Readability_test

-- 
Eric Lease Morgan
University of Notre Dame