Re: Code4Lib Journal Issue 45 [topic modeled]

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Fri, 9 Aug 2019 16:53:41 -0400
To: CODE4LIB_at_LISTS.CLIR.ORG
On Aug 9, 2019, at 4:28 PM, Eric Hanson <ehanson_at_MIT.EDU> wrote:

> The newest issue of code4Lib Journal is now available - https://journal.code4lib.org/issues/issues/issue45


It is nice to see our journal thrive.

Our journal is also great fodder for "distant reading", and I took it upon myself to "read it from afar", and I was happy to see the topic modeling seemed to work well:

  http://carrels.distantreader.org/library/code4lib-45/index.htm#topic-modeling

More specifically, I requested five topics of the issue, the following "themes" were returned with the most-associated article titles:

  1. ar, data, sh - Programming Poetry: Using a Poem Printer and
     Web Programming to Build Vandal Poem of the Day

  2. org, rightsstatements, copyright - Consortial
     RightsStatements.org Implementation and Faceted Search for Reuse
     Rights in Digital Library Materials

  3. terms, library, video - Generating Geographic Terms for
     Streaming Videos Using Python: A Comparative Analysis

  4. sinopia, react, component - Developing Sinopia’s Linked-Data
     Editor with React and Redux

  5. ohsu, search, publications - Building an institutional author
     search tool

The extraction of statistically significant keywords worked well too, but some of them ought to be denoted as stop words:

  http, libraries, library, coding, collecting, component,
  copyright, data, digitization, digitizing, falsc, functioning,
  like, metadata, method, ohsu, people, poems, poetry, policy,
  prints, publication, rdf, react, rightsstatement, searching,
  shacl, sinopia, syndrome, term, video, web, williamson, wordpress

Fun with indexing.

--
Eric Morgan
University of Notre Dame
Received on Fri Aug 09 2019 - 16:59:17 EDT