Yes. Use iText or PDFBox
These are common PDF libraries.
On 2/6/16, 2:24 PM, "Code for Libraries on behalf of Andrew Cunningham" <CODE4LIB_at_LISTSERV.ND.EDU on behalf of lang.support_at_GMAIL.COM> wrote:
>Hi all,
>
>I am working with PDF files in some South Asian and South East Asian
>languages. Each PDF has ActualText added for each tag in the PDF. Each PDF
>has ActualText as an alternative forvthe visible text layer in the PDF.
>
>Is anyone aware of tools the will allow me to index and search PDFs based
>on the ActualText content rather than the visible text layers in the PDF?
>
>Andrew
>
>--
>Andrew Cunningham
>lang.support_at_gmail.com
Received on Mon Feb 08 2016 - 11:57:18 EST