Re: pdf2txt

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Tue, 15 Oct 2013 11:45:24 -0400
To: CODE4LIB_at_LISTSERV.ND.EDU
On Oct 14, 2013, at 7:56 AM, Nicolas Franck <Nicolas.Franck_at_UGENT.BE> wrote:

> Could this also be done by Apache Tika? Or do I miss a crucial point?
> 
> http://tika.apache.org/1.4/gettingstarted.html


Nicolas, this looks VERY promising! It seemingly can extract the OCR from a PDF document as well as extract the text from a Word document. 'More experimenting, but thank you. code4lib++  --Eric Morgan
Received on Tue Oct 15 2013 - 11:45:57 EDT