Re: vufind, ead files, harvesting content, and text mining

From: Automation Department <croseburg_at_nyob> Date: Wed, 27 Oct 2010 10:34:54 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Thanks for posting this Eric. I enjoyed reading your Perl. The general
design structure of your project seems pretty elegant to me ...reusable 
and/or smaller pieces instead of one giant pile of unreadable spaghetti 
...a sin I commit regularly.

Great hacks! Looking forward to watching this develop. Keep us posted.

-- 
Chad Roseburg
Automation Dept.
North Central Regional Library

On 10/27/2010 05:45 AM, Eric Lease Morgan wrote:
> I have written a couple of blog postings as well as some hacks surrounding VUFind, EAD files, harvesting content, and text mining that may be of interest to this group:
>
>    1. EAD files - The first posting and set of scripts describes how I am currently indexing MARC records, but more importantly, EAD files in VUFind. The process involves harvesting EAD files from remote locations, transforming them into HTML, indexing them at the container level, and providing access to the index. [1]
>
>    2. Internet Archive content - The second posting describes how I mirrored content from the Internet archive, munged the mirrored MARC records, indexed them, and provided a rudimentary text mining interface against the locally cached full text. [2]
>
> There are lots of cool (as well as "kewl") possibilities here.
>
> [1] indexing EAD in VUFind - http://bit.ly/cIu0lG
> [2] Internet Archive content - http://bit.ly/dbzYyX
>
>    

-- 
Chad Roseburg
Automation Dept.
North Central Regional Library