Comparing OCR output to dictionary

From: Kimberly Kennedy <kimberlymkennedy_at_nyob>
Date: Thu, 2 Sep 2021 16:07:54 -0400
To: CODE4LIB_at_LISTS.CLIR.ORG
Hello!

I was wondering if anyone has created a script or tool to compare the words
in a text file to a dictionary? I'm looking for a way to quantify the
quality of OCR output. I've heard that counting the number of words that
are in the dictionary is a good quick and dirty way to do this, but I would
like to be able to run this script on larger batches of text files so I can
compare OCR engines (not count words manually).

Let me know if you have any existing tools or thoughts about how to go
about this!

Thanks,

Kim



Kimberly Kennedy
Digital Production Coordinator
Northeastern University Library
ki.kennedy_at_northeastern.edu
kimberlymkennedy_at_gmail.com
Received on Thu Sep 02 2021 - 15:59:58 EDT