[like] Park, Sarah reacted to your message:
________________________________
From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> on behalf of Eric Lease Morgan <00000107b9c961ae-dmarc-request_at_LISTS.CLIR.ORG>
Sent: Friday, October 31, 2025 7:25:27 PM
To: CODE4LIB_at_LISTS.CLIR.ORG <CODE4LIB_at_LISTS.CLIR.ORG>
Subject: Re: [CODE4LIB] DOI Citation Verifier MCP Server [open alex]
I have had pretty good success with Open Alex; given a list of DOIs, I have used Open Alex to augment citations with robust bibliographic values, download the associated PDF files, and measure the impact of the cited documents.
For example, suppose I have a CSV file and one of the columns is named "doi" (i.e. survey-results.csv), then I can feed that CSV file to dois2metadata.py and it will output a second CSV file, metadata.csv. This second CSV file includes robust values for authors, titles, dates, and more importantly, locations where the full text can be gotten.
Once I get this far, I loop through the metadata file to get Open Alex "works" files. These are JSON files with even more detail including author titles, author affiliations, lists of associated controlled vocabulary terms, and sometimes abstracts. I use dois2works.py for this purpose, and it creates a directory of "works" files. I can use another script, cache-pdfs.py, which loops through the same metadata file and caches PDF files, if they are available.
Open Alex also has "source" files, and these files include citation metrics; number of times cited, h-index, etc. I can get these source files using works2sources.py, and the results are saved in the sources directory.
Finally, I can combine the "works" files with the "source" files with to create yet another CSV file, bibliometrics.csv, which includes rudimentary identifiers and various citations scores.
How might I use this suite of software? Well, I might use the Web Of Science API to search... Web Of Science. This results in a CSV file of authors, titles, dates, and sometimes DOIs. I can then feed the CSV file to this suite of software to augment the results and actually get the articles. Once I get abstracts and/or articles, I can then create a data set against the whole for the purposes of analysis (distant reading). Thus, using this system, I can literally read hundreds if not thousands of articles.
In summary, given a list of DOIs, it is possible to use Open Alex to get robust bibliographic data, the associated full text, as well a bibliometrics. Moreover, there are no dollar costs associated with Open Alex. It is really "free as in free beer".
All of the scripts and intermediate files created by this software suite are temporarily linked below. And yes, I know people do not like obfuscated links, but if I didn't use a shortener, then the links would be embarrassingly long:
* survey-results.csv - https://urldefense.com/v3/__https://bit.ly/3JpI6Xk__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGhSosuclQ$
* dois2metadata.py - https://urldefense.com/v3/__https://bit.ly/3Wu1TI5__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGjBaz1bLA$
* metadata.csv - https://urldefense.com/v3/__https://bit.ly/3JBI1Qa__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGhr4henMA$
* dois2works.py - https://urldefense.com/v3/__https://bit.ly/43agPPb__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGjlEfpqfA$
* directory of works files - https://urldefense.com/v3/__https://bit.ly/3JnJDg__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGhgfvSxCA$
* cache-pdfs.py - https://urldefense.com/v3/__https://bit.ly/4oMgMBl__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGjRnEfCJQ$
* works2sources.py - https://urldefense.com/v3/__https://bit.ly/432xRPm__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGjhI0d4ug$
* sources - https://urldefense.com/v3/__https://bit.ly/3JqOVrC__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGinJxuwtA$
* works2sources.py - https://urldefense.com/v3/__https://bit.ly/4oNO5UW__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGiKVbLxYQ$
* works2matrix.py - https://urldefense.com/v3/__https://bit.ly/439hmku__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGi1AVU38Q$
* bibliometrics.csv - https://urldefense.com/v3/__https://bit.ly/43KgXFe__;!!DZ3fjg!49eaE4LOUsqOPMA3PzEFSym1-M_fguLAy-Wepijd4ofND7ZMvAtZVSr1-2hCTNr4mrjWKmyRVKOFIHITegufIMwPZGhjkhgHcg$
(On the other hand, your email provider may very well obfuscate the links a different way, whether you like it or not. Dumb, if you ask me, but that is a different discussion.)
--
Eric Lease Morgan
Librarian Emeritus, University of Notre Dame
Received on Fri Oct 31 2025 - 16:00:08 EDT