To what degree can our descriptive cataloging practices be enhanced
considering the increasing availability of full text content?
Take, for example, the size of a book. Currently we describe its
physical dimensions and number of pages, but does that really tell me
how long the book is? With the availability of full text a "catalog"
can denote the number of words in a document -- which is a much better
indication of length. This book, The Prince by Machiavelli is 31,234
words long (short). Whereas this other book, War And Peace by Tolstoy
is 565,454 words long (very long). Moreover, War And Peace is 18 times
longer than The Prince.
There are other possibilities for descriptive cataloging. How about
intended grade level or "readability"? There exist well-established
tests used to score the grade level and/or readability of a document.
[1] Our word processing programs use these tests to grammar check your
writing. The United States government uses these tests to ensure its
population can read its documentation. Given full text it is possible
to score individual library items and/or whole library collections.
Want to read a book for intended for a 5th grader? You are a college
student, and therefore these books may be apropos.
Why not classify the author to a greater degree? I only want a list of
items written by women.
Examine parts of speech. Use a parts of speech application to count
things like the number of nouns, pronouns, various types of verbs,
adjectives, etc., in a document and then say, "This document contains
a larger than normal number of active verbs; this book as an action
book."
"I want to read a very short, action-packed book, written by a man who
lived in the 19th century but whose story takes place in the Middle
Ages, and I'm only in the 7th grade so please suggest something
accordingly." Given adequate descriptive cataloging details such a
query would be trivial. Given the full text of a book, creating these
details would be trivial as well. We have full text. Hmm...
Once all of these things are represented in our "catalog", not only
can we apply them to searches, but we can also apply analytics
services against the search results. "Your search for author=plato
returned the following 29 items. Click here to display: 1) a timeline
of when they were written, 2) a histogram illustrating each document's
length, 3) a spider chart connoting the degree each discusses core
philosophic concepts". Not only is the content of our metadata records
too string-based as opposed to key-based, it is also weak in terms of
quantitative data. Thus mathematical analysis is difficult, if not
impossible. Remember, computers are great when it comes to math.
Much of our time is spent figuring out new ways to automate old
processes, when we should also be spending some of our time learning
how to use our tools to provide totally new and different services. A
"'next generation' library catalog" will be an embodiment of such
things.
[1] http://en.wikipedia.org/wiki/Readability_test
--
Eric Lease Morgan
University of Notre Dame
Received on Tue Nov 03 2009 - 11:17:28 EST