Re: Converting old tables in PDF to CSV

From: Pikas, Christina K. <Christina.Pikas_at_nyob>
Date: Tue, 21 Jun 2022 19:04:18 +0000
To: CODE4LIB_at_LISTS.CLIR.ORG
Tabula is pretty miraculous in turning hamburgers to cows but scanned from the 70s is a lot to ask. Still, I would try it. https://tabula.technology/


Christina

-----Original Message-----
From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> On Behalf Of Haitz, Lisa (haitzlm)
Sent: Tuesday, June 21, 2022 3:02 PM
To: CODE4LIB_at_LISTS.CLIR.ORG
Subject: [EXT] Re: [CODE4LIB] Converting old tables in PDF to CSV

Acrobat (full version) has an export to excel function. I’ve used it before and my table data was exported correctly as each value was in an excel cell.

😊

From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> on behalf of Matt Sherman <matt.r.sherman_at_GMAIL.COM>
Date: Tuesday, June 21, 2022 at 2:53 PM
To: CODE4LIB_at_LISTS.CLIR.ORG <CODE4LIB_at_LISTS.CLIR.ORG>
Subject: Re: [CODE4LIB] Converting old tables in PDF to CSV External Email: Use Caution


Hm, that should be doable, but an annoying amount of work. I haven't done it with tables but I have done it with bibliographic records and regex.
Helps if there is a very consistent structure to the OCR.

On Tue, Jun 21, 2022 at 1:47 PM Medina-Smith, Andrea M. (Fed) < 000000b92eca49be-dmarc-request_at_lists.clir.org> wrote:

> Hello List,
>
> Has anyone had success converting tables in a PDF to CSV? These are 
> scans of paper from the 70s on forward. I know this isn’t a super easy 
> conversion, but I would think it’s not impossible either.
>
> Thanks,
> Andrea
>
> --
>
> Andrea Medina-Smith
> Data Librarian
> Information Services Office
> National Institute of Standards and Technology 
> andrea.medina-smith_at_nist.gov<mailto:andrea.medina-smith_at_nist.gov>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Forci

> d.org%2F0000-0002-1217-701X&amp;data=05%7C01%7Chaitzlm%40UCMAIL.UC.EDU
> %7C91ae208fd9fd4122494608da53b7446c%7Cf5222e6c5fc648eb8f0373db18203b63%7C1%7C0%7C637914343836515265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=hrrJHUMhxJJ7I4A8bd9lMVqrkuZskwuBy6MtSc0ISaY%3D&amp;reserved=0
>
>
>
Received on Tue Jun 21 2022 - 14:55:58 EDT