Andrea ~
I've not used it myself, but I have heard from others that do text analysis that Tesseract can well handle tabular data in scans: https://github.com/tesseract-ocr/tesseract
~ Amy
--
Amy J. Kirchhoff (she/her)
Constellate Text Analytics Business Manager / Portico, JSTOR
Twitter: @AmyPlusFour
Find out about user interface releases, text analytics classes, and other updates in our email group (https://ithaka.groups.io/g/tdm-jstor-portico).
-----Original Message-----
From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> On Behalf Of Medina-Smith, Andrea M. (Fed)
Sent: Tuesday, June 21, 2022 2:47 PM
To: CODE4LIB_at_LISTS.CLIR.ORG
Subject: [CODE4LIB] Converting old tables in PDF to CSV
>>>>>Caution: This message did not originate from within ITHAKA's email
>>>>>system. Please use caution when opening attachments and following
>>>>>links within this message.<<<<<
Hello List,
Has anyone had success converting tables in a PDF to CSV? These are scans of paper from the 70s on forward. I know this isn’t a super easy conversion, but I would think it’s not impossible either.
Thanks,
Andrea
--
Andrea Medina-Smith
Data Librarian
Information Services Office
National Institute of Standards and Technology andrea.medina-smith_at_nist.gov<mailto:andrea.medina-smith_at_nist.gov>
https://orcid.org/0000-0002-1217-701X
Received on Tue Jun 21 2022 - 16:33:26 EDT