Re: parts-of-speech

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Wed, 9 Feb 2011 22:02:41 -0500
To: NGC4LIB_at_LISTSERV.ND.EDU
On Feb 9, 2011, at 12:40 PM, Laval Hunsucker wrote:

>>  http://bit.ly/hsxD2i
> 
> You're talking only  the full text of English-language documents 
> here, right ?
> 
> But even then, there's lots of room for fundamental ambiguity/
> uncertainty in your source of data, it seems to me. To deal with 
> such meaningfully, the software'd have to be enormously 
> sophisticated, no ? Can we even make software that 
> sophisticated ?


Thank you for your interest, and in a nutshell I perceive two uses for the incorporation of text mining functionality into library discovery systems:

  1. Enhanced evaluation - Books are included with a number of finding aids in their production. They include but are not necessarily limited to page numbers, tables of content, back-of-the-book indexes, chapter headings, lists of figures, prefaces and introductions. These things have not always existed in books. According to Ann Blair in her book Too Much to Know, such things were initially used as tools to deal with information overload. Look at the table of contents. Glance through the index. Look at the list of figures. All of these things help a person evaluate whether or not to spend their limited time in reading the book. The tools I have implemented, while certainly not original, can help the reader to the same thing. The experiments surrounding parts-of-speech have initially proven unfruitful, but as I look at specific POS usage I think patterns will arise, and categorizations of text will become evident. If that is the case, then such thing could be made part of a !
 discovery system. "I am interested in longer works, written for graduate students." I am interested in novels written in the first person from a female's point of view. Given the amount of full-text available, such a thing is now possible. Wouldn't it be interesting to crawl through all of the digitized books and find hidden gems?

  2. Enhanced scholarship - It is possible to find new and nuanced patterned of writing through the use of text mining. By counting the words, tallying them, comparing their usage with the usage of other work or other authors it is more than possible to measure stylistic characteristics. By graphing and charting where concepts appear in texts, it is possible to literally illustrate the movement of one's theme through a book. If one want to be a Dickens scholar, then to what degree is a person expected to read all of Dickens. If the expectation is high, then text mining functions applied to the corpus will make that reading process easier or at least more efficient. Such things where not possible before the existence of so much full text. It is much more possible to get a handle on the totality of Victorian literature using these techniques. There are patterns, themes, and trends just waiting to be uncovered.

In short, I see the sort of things I've been doing as a supplement to the reading process, much like tables of contents are as well as the process writing abstracts.

Just because a work has been translated from its original language does not mean the analysis against it is meaningless. It just means the analysis is less meaningful. A person can still learn a lot about classic philosophy from translations.

Finally, I wrote the tiniest of essays the other day and used it as the foundation for a presentation to the library faculty here at Notre Dame. It is little more than an outline, but it is full of "kewl" links that may make some of the things outline above more clear. From the conclusion:

  In my mind, the combination of digital humanities computing
  techniques and the practices of librarianship would be a marriage
  made in heaven. By supplementing our collections with full text
  materials and then enhancing our systems to facilitate text
  mining, we can not only make it easier for readers to find data
  and information, but we can also make that data and information
  easier to use and understand. As Ranganathan said, "Save the time
  of the reader."
  
  http://bit.ly/h3G0RQ

Thank you for asking.

-- 
Eric Lease Morgan
Received on Wed Feb 09 2011 - 22:02:54 EST