distant reading against code4lib journal

From: Eric Lease Morgan <00000107b9c961ae-dmarc-request_at_nyob>
Date: Thu, 16 Oct 2025 17:30:22 +0200
To: CODE4LIB_at_LISTS.CLIR.ORG
I have begun to do some distant reading against the whole of Code4Lib Journal.

I first harvested the whole journal, parsed it into individual articles, and created a data set from the result. I then extracted bunches o' features, and created a descriptive statistics-like report. [1] From the page you can see I collected about 550 articles for a total of 2.4 million words.

After doing rudimentary frequency analysis against computed keywords, I visualized the results. This is one way to describe the aboutness of the Journal. [2]

I then applied topic modeling over the whole. The resulting topics were very similar to the frequencies. I then plotted the topics over time to see how they ebbed and flowed. The "project" topic dominantes. The "data" and "metadata" topics are close seconds. [3]

The Journal's articles and keywords can be seen as nodes and edges in a network graph. I created such a graph and visualized the results. Again, to some degree, the results echo the previous observations; the themes of "metadata", "information", and "data" are large. [4]

Since "data" and "metdata" were common themes, I wanted to see how they were defined. Linked is a concordance for the phrase "metadata is", but alas, I saw few defintions. [5]

I have begun to do similar things with three additional journals about digital libraries: DLib Magazine, Ariadne, and ITAL. [6, 7, 8] In the end I hope to: 1) compare & contrast the journals, and 2) query them all for defintions of things like "libraries", "librarianship", "information", and "knowledge".

Finally, all of the processes outlined above can be done against any set of narrative texts. Examples include: dozens of books, hundreds of journal articles, or thousands of journal article citations complete with abtracts.

Fun!


Links

[1] report - https://bit.ly/477kVbV
[2] keyword frequencies - https://bit.ly/4h9HJMM
[3] topics over time - https://bit.ly/3KR8ZDU
[4] network - https://bit.ly/4oj9Lrv
[5] "metadata is" - https://bit.ly/476TvD6
[6] DLib Magazine - https://bit.ly/47aXX3U
[7] Ariadne - https://bit.ly/4ogeOcc
[8] ITAL - https://bit.ly/47sDKYv

--
Eric Morgan
Librarian Emeritus, University of Notre Dame
Received on Thu Oct 16 2025 - 11:31:38 EDT