Re: vufind, ead files, harvesting content, and text mining

From: Automation Department <croseburg_at_nyob>
Date: Wed, 27 Oct 2010 10:34:54 -0700
To: NGC4LIB_at_LISTSERV.ND.EDU
Thanks for posting this Eric. I enjoyed reading your Perl. The general
design structure of your project seems pretty elegant to me ...reusable 
and/or smaller pieces instead of one giant pile of unreadable spaghetti 
...a sin I commit regularly.

Great hacks! Looking forward to watching this develop. Keep us posted.

-- 
Chad Roseburg
Automation Dept.
North Central Regional Library



On 10/27/2010 05:45 AM, Eric Lease Morgan wrote:
> I have written a couple of blog postings as well as some hacks surrounding VUFind, EAD files, harvesting content, and text mining that may be of interest to this group:
>
>    1. EAD files - The first posting and set of scripts describes how I am currently indexing MARC records, but more importantly, EAD files in VUFind. The process involves harvesting EAD files from remote locations, transforming them into HTML, indexing them at the container level, and providing access to the index. [1]
>
>    2. Internet Archive content - The second posting describes how I mirrored content from the Internet archive, munged the mirrored MARC records, indexed them, and provided a rudimentary text mining interface against the locally cached full text. [2]
>
> There are lots of cool (as well as "kewl") possibilities here.
>
> [1] indexing EAD in VUFind - http://bit.ly/cIu0lG
> [2] Internet Archive content - http://bit.ly/dbzYyX
>
>    


-- 
Chad Roseburg
Automation Dept.
North Central Regional Library
Received on Wed Oct 27 2010 - 13:35:46 EDT