What's the current best for extracting tables from ocr'd text?

From: Pikas, Christina K. <Christina.Pikas_at_nyob>
Date: Fri, 3 Apr 2026 11:28:22 +0000
To: CODE4LIB_at_LISTS.CLIR.ORG
Hi All,
Good morning from foggy Maryland.  Materials scientists and aerospace engineers have these 1970s and 1980s (and prob before) technical reports with vast tables of experimental data. Pasting a picture below to give a flavor:
[cid:image002.png_at_01DCC33B.72E4B430]

I can OCR this with Acrobat Pro, but what's the current best for extracting the table? We can't upload this into a commercial service and our on prem AI models went "NOPE!"  I tried tabula - and did ok with some of the tables sprinkled through the text but not ones like shown in the image. It looks like there are a number of tools intended for AI and RAG (like Docling). Does anyone have experience with these for this purpose?

If it's a paid service also interested, depending on a number of factors.

Thanks in advance,

Christina



Christina K. Pikas, PhD
Principal Professional Staff
Johns Hopkins Applied Physics Laboratory
11100 Johns Hopkins Rd, Laurel, MD 20723
O: (240) 228-4812

[x]<https://twitter.com/JHUAPL> [bluesky] <https://bsky.app/profile/jhuapl.bsky.social>  [facebook] <https://www.facebook.com/JHUAPL/>  [instagram] <https://www.instagram.com/johnshopkinsapl/>  [threads] <https://www.threads.net/@johnshopkinsapl>  [youtube] <https://www.youtube.com/c/jhuapl>  [linkedin] <https://www.linkedin.com/company/johns-hopkins-university-applied-physics-laboratory/>

[Applied Physics Laboratory]<https://www.jhuapl.edu/>










image004.gif
image005.gif
image006.gif
image007.gif
image008.gif
image009.gif
image010.gif
image011.gif
image002.png
Received on Fri Apr 03 2026 - 07:47:10 EDT