Re: scraping or extracting structured data from a pdf

From: Joe Hourclé <oneiros_at_nyob>
Date: Thu, 12 May 2022 15:12:05 -0400
To: CODE4LIB_at_LISTS.CLIR.ORG
> 
> On May 12, 2022, at 2:40 PM, Danielle Reay <dreay_at_drew.edu> wrote:
> 
> Hello,
> 
> We have a faculty member looking to create a dataset from an annotated
> bibliography she compiled. Right now it exists as a word file and as a pdf.
> The entries are relatively structured with a citation and an abstract, but
> the document is about 150 pages long with multiple entries per page. Rather
> than manually copy and paste everything to create the spreadsheet/csv, I
> wanted to ask for suggestions or approaches to doing this by either
> scraping or extracting structured data from the pdf. Thanks very much in
> advance!


I personally 

Sent from a mobile device with a crappy on screen keyboard and obnoxious "autocorrect"
Received on Thu May 12 2022 - 15:05:45 EDT