Re: descriptive cataloging

From: Alejandro Garza Gonzalez <alejandro.garza_at_nyob>
Date: Wed, 4 Nov 2009 12:07:20 -0600
To: NGC4LIB_at_LISTSERV.ND.EDU
I bet Google is already doing some analysis of text in for ranking 
results not only in the Book Search service, but to enhance their web 
search.

_alejandro


Eric Lease Morgan said the following on 03/11/2009 10:15 a.m.:
> To what degree can our descriptive cataloging practices be enhanced  
> considering the increasing availability of full text content?
>
> Take, for example, the size of a book. Currently we describe its  
> physical dimensions and number of pages, but does that really tell me  
> how long the book is? With the availability of full text a "catalog"  
> can denote the number of words in a document -- which is a much better  
> indication of length. This book, The Prince by Machiavelli is 31,234  
> words long (short). Whereas this other book, War And Peace by Tolstoy  
> is 565,454 words long (very long). Moreover, War And Peace is 18 times  
> longer than The Prince.
>
> There are other possibilities for descriptive cataloging. How about  
> intended grade level or "readability"? There exist well-established  
> tests used to score the grade level and/or readability of a document.  
> [1] Our word processing programs use these tests to grammar check your  
> writing. The United States government uses these tests to ensure its  
> population can read its documentation. Given full text it is possible  
> to score individual library items and/or whole library collections.  
> Want to read a book for intended for a 5th grader? You are a college  
> student, and therefore these books may be apropos.
>
> Why not classify the author to a greater degree? I only want a list of  
> items written by women.
>
> Examine parts of speech. Use a parts of speech application to count  
> things like the number of nouns, pronouns, various types of verbs,  
> adjectives, etc., in a document and then say, "This document contains  
> a larger than normal number of active verbs; this book as an action  
> book."
>
> "I want to read a very short, action-packed book, written by a man who  
> lived in the 19th century but whose story takes place in the Middle  
> Ages, and I'm only in the 7th grade so please suggest something  
> accordingly." Given adequate descriptive cataloging details such a  
> query would be trivial. Given the full text of a book, creating these  
> details would be trivial as well. We have full text. Hmm...
>
> Once all of these things are represented in our "catalog", not only  
> can we apply them to searches, but we can also apply analytics  
> services against the search results. "Your search for author=plato  
> returned the following 29 items. Click here to display: 1) a timeline  
> of when they were written, 2) a histogram illustrating each document's  
> length, 3) a spider chart connoting the degree each discusses core  
> philosophic concepts". Not only is the content of our metadata records  
> too string-based as opposed to key-based, it is also weak in terms of  
> quantitative data. Thus mathematical analysis is difficult, if not  
> impossible. Remember, computers are great when it comes to math.
>
> Much of our time is spent figuring out new ways to automate old  
> processes, when we should also be spending some of our time learning  
> how to use our tools to provide totally new and different services. A  
> "'next generation' library catalog" will be an embodiment of such  
> things.
>
> [1] http://en.wikipedia.org/wiki/Readability_test
>
>   

-- 
_________________ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Coordinación de proyectos y desarrollo de sistemas
Centro Innov_at_TE, Centro para la Innovación en Tecnología y Educación
Tecnológico de Monterrey

Tel. +52 [81] 8358.2000, Ext. 6751
Enlace intercampus: 80.689.6751, 80.788.6106
http://www.itesm.mx/innovate/

El contenido de este mensaje de datos no se considera oferta, propuesta 
o acuerdo, sino hasta que sea confirmado en documento por escrito que 
contenga la firma autógrafa del apoderado legal del ITESM. El contenido 
de este mensaje de datos es confidencial y se entiende dirigido y para 
uso exclusivo del destinatario, por lo que no podrá distribuirse y/o 
difundirse por ningún medio sin la previa autorización del emisor 
original. Si usted no es el destinatario, se le prohíbe su utilización 
total o parcial para cualquier fin.

The content of this data transmission must not be considered an offer, 
proposal, understanding or agreement unless it is confirmed in a 
document signed by a legal representative of ITESM. The content of this 
data transmission is confidential and is intended to be delivered only 
to the addressees. Therefore, it shall not be distributed and/or 
disclosed through any means without the authorization of the original 
sender. If you are not the addressee, you are forbidden from using it, 
either totally or partially, for any purpose.
Received on Wed Nov 04 2009 - 13:11:02 EST