Re: text mining

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Wed, 11 May 2011 13:51:10 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
On May 11, 2011, at 11:00 AM, Jonathan Rochkind wrote:

>> * Is the Pope always right?
>>   bib' record: http://www.catholicresearch.net/Record/undmarc_000743445
>>   concordance: http://www.catholicresearch.net/concordances/?id=undmarc_000743445
> 
> This is an interesting idea.
> 
> It occurs to me that HathiTrust already has the data (scanned full text) 
> for many items neccesary to provide concordances....

Jonathan, thank you for the nod. And just for the record, all of the full text items in my Alex Catalogue of Electronic Texts are associated with concordances (and more). 

For example a search in Alex for the word "philosophy" and limited Project Gutenberg returns about 3,000 items. [1] After choosing An Introduction to Philosophy by Fullerton we can see that the book of an average length and should be readable by a 12th grader. [2] After the using the concordance function we can see that the word "knowledge" appears 208 times in the book. [3] If you have a browser supporting SVG (XML-rendered graphics, most browsers but IE), then you can visualize the word knowledge and its association with other words with a network diagram.

The point I'm trying to make is this, "If one has the full text of an item, then there are so many things one can do with it that go beyond find."


[1] philosophy in Alex - http://bit.ly/lnk9Tr
[2] Introduction to Philosophy - http://infomotions.com/etexts/id/etext16406
[3] knowledge in the text - http://bit.ly/mMqHex
[3] network diagram - http://bit.ly/l3Dmtx

-- 
Eric Lease Morgan, Digital Projects Librarian
University of Notre Dame
(574) 631-8604

Great Books Survey -- http://bit.ly/auPD9Q
Received on Wed May 11 2011 - 13:51:36 EDT