Re: PDF->text extraction

From: Chad Mills <cmmills_at_nyob> Date: Tue, 21 Jun 2011 10:46:12 -0400 To: CODE4LIB_at_LISTSERV.ND.EDU

We also use pdftotext and have been happy with it.

--
Chad Mills
Programming Coordinator
Ph: 732.932.8573 x123
Fax: 732.932.1386
Cell: 732.309.8538

Rutgers University Libraries
Scholarly Communication Center
Room 409D, Alexander Library
169 College Avenue, New Brunswick, NJ 08901

http://rucore.libraries.rutgers.edu/

----- Original Message -----
From: "Eric Lease Morgan" <emorgan_at_ND.EDU>
To: CODE4LIB_at_LISTSERV.ND.EDU
Sent: Tuesday, June 21, 2011 10:28:39 AM
Subject: Re: [CODE4LIB] PDF->text extraction

On Jun 21, 2011, at 10:23 AM, Owen Stephens wrote:

> We've tried iText but had issues with quality
> We moved to PDFBox but are having performance issues

I have been satisfied with pdftotext which is a part of the Xpdf suite of tools -- http://bit.ly/kIHD1x

-- 
Eric Lease Morgan
University of Notre Dame

(574) 631-8604