Re: our profession's bibliographic information

From: Eric Lease Morgan <emorgan_at_nyob> Date: Tue, 21 Dec 2010 14:55:08 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

>> In the context of my previous message, there are two types of data:  
>> 1) quantitative, and 2) qualitative. The former is applicable to  
>> mathematical processes. The later is not.
> 
> But you can quantify what you call qualitative data, that is, data  
> that is not numeric. You can count anything, as the applications that  
> are making use of full text are doing. You can make "more related to"  
> calculations even using words ("this word is more related to another  
> word than that word" or "A has a greater relationship to B than C has  
> a relation to B"). I'm not sure why you would limit yourself to  
> numerical data, rather than countable data. Once you count, you turn  
> your data into quantity. Based on the nature of our data, I think  
> that's where we'll get bang for our computational buck.

Only things that are represented as numbers are countable. I can't count The Adventures of Huckleberry Finn. Nor can I count Origami--Juvenile works. Yes, I can count the number of books by Mark Twain a library owns, and I can count the number of works related to paper craft, but these tabulations tell me about the collection. I want to produce quantitative information on works, not the catalog. For example, some measurable characteristics of works may include:

  * Big Name index (percentage of quotes from leading authorities)
  * color index (normalized percentage of color words used)
  * date written
  * grade level
  * Great Ideas index (percentage of philosophy ideas in text)
  * length in words
  * librarian rating
  * number of citations
  * number of editions
  * number of graphics
  * number of pictures
  * number of prizes won
  * number of times circulated
  * percentage of languages used in a text
  * percentage of mathematical formulas in a text
  * percentage of unique words in a text
  * price
  * publisher rating
  * readability score
  * reader rating

Given imagination, I'm sure many more quantifiable characteristics could be enumerated.

Once done, these characteristics can be compared to one another, and they can be used from two sides of the same problem. On one hand such characteristics can be integrated into "discovery systems" (catalogs) to assist the reader in identifying items for use. "I want a book that is popular, contains a minimum of mathematical formulas, has many citations and illustrations, but is not too difficult to read." On the other hand, a person could identify an item not in a collection, feed the item to a system for analysis, and return a list of characteristics about the item. "This item is longer than most, has many citations, is expensive, has a low reader rating, and is not very 'colorful'." Finally, some sort of graph chart could be drawn literally illustrating the characteristics of a given work.

Granted, none of this was feasible a decade ago since there was little full text. Things are changing. Things are different now. Full text is becoming the norm, and this opens up all sorts of possibilities. Somebody is going to do this sort of work, if it isn't being investigated already. Libraries are not about books. They are about what is inside the books. We need to be providing tools enabling our constituents to use these insides lest the profession becomes marginalized. Find is not as much of a problem to solve as it used to be. People can find more than they need, and the amount of effort needed to find more is past the point of diminishing returns. Instead, use and understanding is the name of the game. Measurement is a standard means to understanding. Quantification is necessary element of measurement.

-- 
Eric Lease Morgan
"Take the Great Books Survey -- http://bit.ly/auPD9Q"