Re: web-based ocr

From: Richard Sarvas <Richard.Sarvas_at_nyob> Date: Tue, 12 Mar 2013 18:16:03 +0000 To: CODE4LIB_at_LISTSERV.ND.EDU

Something like this is on my "to do" list for our future Fedora Commons deployment here at UConn. I was considering wrapping a SOAP interface around something like the Perl Image::OCR::Tesseract module and adding it to our ingest pipeline unless someone can recommend a better OCR application.

Rick

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB_at_LISTSERV.ND.EDU] On Behalf Of Till Kinstler
Sent: Tuesday, March 12, 2013 12:30 PM
To: CODE4LIB_at_LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] web-based ocr

Am 12.03.2013 16:57, schrieb Eric Lease Morgan:

> Does anybody know of something like this that exists already?

We are running something like this. Not with a HTML or REST-ful front end, but WebDAV. The users of this service do "mass digitization". They mount their individual WebDAV share, push scanned image files there and read the OCR results from output files (usually not "by hand" but with some software that manages their digitization workflow).
The actual OCR is done by an ABBYY Recognition Server, the "WebDAV front end" including accounting is a straightforward home-brewed solution.

Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Gˆttinger Sieben 1, D 37073 Gˆttingen kinstler@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de