Re: web-based ocr

From: chris fitzpatrick <chrisfitzpat_at_nyob> Date: Wed, 13 Mar 2013 15:13:57 +0100 To: CODE4LIB_at_LISTSERV.ND.EDU

I recommend looking at pdfbeads. It's in ruby and the documentation is 
mostly in Russian ( 
http://rubyforge.org/docman/view.php/9752/10692/pdfbeads.ru.html ), but 
it provides both a library and an easy to use executiable to build PDFs 
out of hOCR files and images. You literally just point it at a directory 
with page images and hOCR files and it spits out a PDF. Very handy.

Also, the DIY Book Scanner forum (diybookscanner.org ) is a great 
resource if you're into these sorts of things...

Eric Lease Morgan wrote:
>
> On Mar 13, 2013, at 8:07 AM, Ben Brumfield<benwbrum_at_GMAIL.COM> wrote:
>
>>
>> https://github.com/idigbio-aocr/RESTAPI/tree/master/doc
>
>
> Interesting. Printed for future reference. Thank you.
>
> BTW, I did finally get Image::OCR::Tesseract to make, make test, and 
> make install correctly. I did not have the correct/proper libraries 
> installed for Tesseract's supporting Leptonica library. Now I need to 
> find a PDF library similar to libtff and libpng.
>
> --
> Eric Morgan