Re: eebo

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Fri, 5 Jun 2015 09:03:39 -0400
To: CODE4LIB_at_LISTSERV.ND.EDU
On Jun 5, 2015, at 8:20 AM, Ethan Gruber <ewg4xuva_at_GMAIL.COM> wrote:

>> Does anybody here have experience reading the SGML/XML files representing
>> the content of EEBO?
> 
> Are these in TEI? Back when I worked for the University of Virginia
> Library, I did a lot of clean up work and migration of Chadwyck-Healey
> stuff into TEI-P4 compliant XML (thousands of files), but unfortunately all
> of the Perl scripts to migrate old garbage SGML into XML are probably gone.
> 
> How many of these things are really worth keeping, i.e., were not digitized
> by any other organization that has freely published them online?


The data I have comes in two flavors: 1) some flavor of SGML, and 2) some flavor of XML which is TEI-like, but not TEI. All of the files are worth keeping because I get the basic bibliographic information (id, author, title, date, keywords/subjects), as well as transcribed text. (No images.) Given such data, I think I can provide interesting, cool, and “kewl” services. Given the id number, I may then be able to link to the scanned image. Wish me luck. —ELM
Received on Fri Jun 05 2015 - 09:05:56 EDT