Re: Software to OCR Fraktur?

From: Andrew Gray <shimgray_at_nyob>
Date: Mon, 9 Nov 2009 22:24:24 +0000
To: NGC4LIB_at_LISTSERV.ND.EDU
2009/11/9 Simon Spero <ses_at_unc.edu>:
> Tesseract has a profile that was trained on Fraktur (
> http://tesseract-ocr.googlecode.com/files/tesseract-2.01.deu-f.tar.gz)
> I haven't tried it, so I can't say how fast or accurate it is.

I've done a quick demo, using a digitally created piece of blackletter
text and an actual scan:

http://www.generalist.org.uk/fraktur.html

The scan doesn't come out marvellously, but it's not bad, and it's
certainly as good as some "normal" English OCR I've seen! Some words
are garbled, but conversely some whole lines seem to come out
letter-perfect. A better scan - or access to the original files -
would probably help.

Running time is a few seconds for a few-MB tiff file; I haven't timed
it for large-scale use, but it looks like something you could easily
batch up and leave overnight.

Hope that helps!

-- 
- Andrew Gray
  andrew.gray_at_dunelm.org.uk
Received on Mon Nov 09 2009 - 17:29:27 EST