Re: Scanned PDF to text

From: Kyle Banerjee <kyle.banerjee_at_nyob>
Date: Tue, 9 Dec 2014 08:45:51 -0800
To: CODE4LIB_at_LISTSERV.ND.EDU
> I’m not quite sure if I understand the question, but if all you want to do is pull the text out of an OCR’ed PDF file, then I have found both Tika and PDFtotext to be useful tools....
> 
> On the other hand, if you need to do the OCR itself, then employing Tesseract is probably the way to go. 

For clarity, I have to do the OCR itself. I've been using CAM::PDF to extract existing text.

Kyle
Received on Tue Dec 09 2014 - 11:46:47 EST