Re: pdf2txt

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Tue, 15 Oct 2013 12:25:07 -0400
To: CODE4LIB_at_LISTSERV.ND.EDU
On Oct 14, 2013, at 4:49 PM, Robert Haschart <rh9ec_at_VIRGINIA.EDU> wrote:

>> For a limited period of time I am making publicly available a Web-based program called PDF2TXT --http://bit.ly/1bJRyh8
> 
> Although based on some subsequent messages where you mention tesseract 
> maybe I misunderstood and your tool only handles pdfs that have already 
> been OCR'ed which would explain why the second document (which only 
> contains page images) fails.

Robert, that's correct. As of right now the document needs to have been previously OCRed. --Eric
Received on Tue Oct 15 2013 - 12:25:39 EDT