Re: Creating pdfs from images and their text

From: Daron Dierkes <daron.dierkes_at_nyob>
Date: Fri, 17 Jan 2014 11:41:14 -0600
To: CODE4LIB_at_LISTSERV.ND.EDU
But Raffaele, how do you generate the hOCR in the first place if you're
using human-generated transcripts and not OCR?  Hand coding each page would
take forever.


On Fri, Jan 17, 2014 at 3:24 AM, raffaele messuti <
raffaele.messuti_at_gmail.com> wrote:

> Padraic Stack wrote:
> > What is a straightforward way to combine the text with overlaid images
> > to create searchable pdfs?
>
> having transcription in hOCR[1] format the tool you should need is
> hocr2pdf[2].
> i never tried for pdfs, years ago i made some djvu following this
> tutorial[3]
>
> [1] http://en.wikipedia.org/wiki/HOCR
> [2] http://manpages.ubuntu.com/manpages/lucid/man1/hocr2pdf.1.html
> [3] https://philikon.wordpress.com/2009/07/23/digitizing-books-to-djvu/
>
> ciao.
>
> --
> raffaele
>
Received on Fri Jan 17 2014 - 12:41:50 EST