Re: llm technology is especially useful

From: <patrickpletcher_at_nyob> Date: Mon, 1 Jun 2026 21:13:51 -0500 To: CODE4LIB_at_LISTS.CLIR.ORG

First time poster. I am trying to teach myself to code, and I have found that Claude is very useful if you have very specific technical questions and don't have an expert on call, such as a professor, that you can ask. I don't think what I am doing would be possible if I didn't have that. 

On May 29, 2026 9:17:41 AM CDT, Eric Lease Morgan <00000107b9c961ae-dmarc-request_at_LISTS.CLIR.ORG> wrote:
>I assert the use of LLM technology is especially useful -- an accurate time-saver -- when it is applied to one's own content.
>
>For example, I scraped one year's worth of Code4Lib mailing list postings, 2008 to be exact. I then used natural language processing to count and tabulate different types of words from the postings: unigrams, bigrams, keywords, parts-of-speech words, and named-entities. I then fed these counts and tabulations to an LLM (gpt-oss:120b-cloud accessed through Ollama) and asked the model to characterize the postings. Here is the result:
>
>  The code4lib‑email carrel reads like a dynamic,
>  conversational forum dedicated to library‑technology. Its
>  lexical profile is dominated by first‑person pronouns and
>  informal markers, indicating a community that constantly
>  exchanges ideas and asks for help.  Core nouns such as
>  library, web, metadata, software, and API appear with high
>  frequency, while adjectives like open, digital, free, and
>  available stress a strong commitment to open‑source and
>  accessible solutions.  Frequently occurring verbs (use,
>  share, provide, think, ask) and bigrams such as "open
>  source," "I think," and "you want" reveal an action‑oriented
>  dialogue centered on problem‑solving, code sharing, and
>  collaborative refinement of library services.
>  
>
>I wanted to know about open source software, so I: 1) identified a set of sentences containing the phrase "open source software", used the LLM to summarize the result, and then asked it to address the question "What are the advantages and disadvantages of open source software?". Here are the results:
>
>  This text is a collection of discussions and announcements
>  centered around open source software in the context of
>  libraries, drawn from what appears to be a mailing list or
>  forum (possibly CODE4LIB). Key themes include the promotion
>  and benefits of open source software—emphasizing user
>  freedom, community collaboration, and independence from
>  proprietary vendors—as well as practical challenges like
>  software compilation, distribution, and implementation in
>  library settings.
>    
>  Based on the provided text, the primary advantage of open
>  source software is the freedom it grants users through the
>  absence of proprietary control. This freedom allows the code
>  to be modified by anyone to suit their needs, liberating
>  users from being at the mercy of closed-source vendors with
>  their forced upgrades and compatibility issues. It is
>  described as being about community, where usability takes
>  priority over profitability. A noted disadvantage, however,
>  is that the open-source model shifts most of the costs from
>  acquisition to operations and maintenance. Furthermore,
>  introducing open source development in organizations
>  accustomed to vendor-based relationships requires
>  significant change management.
>  
>  
>I did the same thing but this time regarding MARC: What are the advantages and disadvantages of MARC? Here are the results:
>
>  This email thread is a sprawling discussion among librarians
>  and tech folks on the Code4Lib mailing list, focusing
>  heavily on MARC records—the standard format for library
>  catalog data. The conversation covers a range of topics,
>  including technical challenges with processing MARC data
>  (like character encoding issues and parsing), tools for
>  working with MARC (such as MarcEdit, MARC4J, and others),
>  and debates around the limitations and flexibility of the
>  MARC format itself.
>    
>  Based on the provided source material, the advantages and
>  disadvantages of MARC are not explicitly detailed in a
>  structured or comprehensive manner. However, some insights
>  can be inferred from the fragmented discussions. MARC is
>  noted for its flexibility as a data structure, having been
>  designed to exploit the technological environment of its
>  time effectively, which suggests it was once well-suited to
>  library needs. On the other hand, it is described as an
>  "annoying data format," with limitations such as issues with
>  character encodings during conversion and a rigid record
>  length constraint in its leader field, reflecting its age
>  and the challenges of adapting it to modern systems.
>  Additionally, the complexity of parsing MARC data is
>  mentioned, alongside the need for specialized tools or
>  conversion to more manageable formats like MODS or MARCXML
>  to improve usability.
>  
>
>The results are not only plausible but I assert they are true and accurate. Thus, using a combination of natural language processing and LLM technology I was able to quickly and easily read, make sense of, and learn from a large corpora. [1] Even more, I can apply this process to any corpora, large or small: a book, a set of books, a few journal articles, hundreds of journal articles, etc. 
>
>What does this cost? Financially, very little. I pay $20/month to use Ollama's cloud services. Computer-wise all of this can be done on a laptop computer, but the process is quicker and easier when I use my 64-core Linux computer. Granted there are environmental costs. Hmmm... Are there moral costs? That is to be discussed too. Are there professional issues? To be sure!
>
>All that said, we all continue to suffer from information overload. Libraries are a part of that problem. Just look at the size of your library's collection. Not small? The application of natural language processing and LLMs can make our collections more useful, and isn't that the point? To make our collections useful?
>
>If all of this is true, then why, when it comes to LLM technology, do I feel there is so much trepidation in Library Land? What am I missing? I'd really like to know.
>
>
>Note
>
>[1] The corpus includes about 2,200 postings for a total of .79 million words. By comparison, the Bible is about .8 million words long, Melville's Moby Dick is about .25 million words long, and based on my experience, the typical scholarly journal article is about .007 million words long.
>
>--
>Eric Lease Morgan <emorgan_at_nd.edu>
>Librarian Emeritus, University of Notre Dame

-- Sent from /e/OS Mail.