Re: Resignation

From: Eric Lease Morgan <emorgan_at_nyob> Date: Tue, 4 Sep 2007 11:26:40 -0400 To: NGC4LIB_at_listserv.nd.edu

On Aug 30, 2007, at 5:16 PM, Sperr, Edwin wrote:

> I wonder how well such approaches would work in an environment
> where the
> length of the texts is variable and the texts themselves often
> meandering from point to point?  Is there another test corpus that
> models library requirements better?  Anybody banging at the Project
> Guttenberg docs yet?

I have done this to a small degree. Here's how:

   1. Downloaded about 14,000 texts, mostly from Project Gutenberg.
   2. Used a couple of tools to extract relevant sentences and
statistically significant words.
   3. Updated a database of titles, creators, texts, and keywords
accordingly.
   4. Generated XHTML out of the whole thing.
   5. Full-text indexed it.
   6. Created a browsable/searchable interface.

Considering I was only one person with relatively small hardware, I
think I did pretty well. It is not perfect but a definite step in the
right direction. Just think what I could have done if I had a ten or
so people. Hmmm...  Try:

   http://infomotions.com/alex/

--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame

(574) 631-8604