question and answer system

From: Eric Lease Morgan <emorgan_at_nyob>
Date: Thu, 12 Jan 2023 10:44:20 -0500
To: CODE4LIB_at_LISTS.CLIR.ORG

I have been playing with a new toy -- a question and answer system. [1, 2]

Here's how it works. Save a document as a plain text file. The document can be just about anything that makes sense. Examples include: a job posting, a conference announcement, or a journal article. Apply a previously created machine learning model to the document, and the result is a list of questions. Feed the list of questions and the document to another model, and get back a list of answers. These models are embedded and configurable in a couple of Python scripts, as the links below outline. Most of the models are available from a repository of models called Hugging Face. [3]

I applied my implementation to a message sent to our list earlier today, and a few of the more interesting questions and answers include:

  How much do participants travel stipends?

     answer: up to $1000
    context: rous support from the Mellon Foundation, participant
             travel stipends (up to $1000) are available to offset air
             and/or ground transportation, parking, 

  What date will we follow up with you if your application is accepted?

     answer: February 3, 2023
    context: application is accepted, we will follow up with you no
             later than February 3, 2023. For more details, including an
              agenda, see the Event Website <ht

  What is a publication medium that is both a primary source and a networked
  container of primary sources?

     answer: the web
    context: is both a primary source and a networked container of
             primary sources, the web presents challenges of scale and
             complexity for those that seek to int


The full list of about twenty questions and answers is attached.

I did this same sort of thing against chapters in Moby Dick, asked questions like "Who is Ahab?", "Where did they sail?", and "What is whaling?" The answers are often times quite plausible.

This sort of system can be applied more broadly in Library Land. Students, researchers, and scholars are suffering from information overload; we all continue to drink from the proverbial firehose. Given something like the system outlined above, librarians and libraries can go beyond providing access to data, information, knowledge. More specifically, we can support the process of using & understanding data, information, and knowledge.

Fun with digital scholarship?


[1] generate questions - https://haystack.deepset.ai/tutorials/13_question_generation
[2] answer questions - https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline
[3] Hugging Face - https://huggingface.co/models

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

https://cds.library.nd.edu




Questions and answers

This is a list of questions and answers rooted in a conference annoucement posted to the Code4Lib mailing list. The announcment was fed to a machine learning model which returned a list of questions. The questions were then fed to another model which returned answers. In this particular case, the answers are more than plausible, if not 100% accurate. Fun with the digital scholarship. --Eric Lease Morgan <emorgan_at_nd.edu>, January 12, 2023


  How much do participants travel stipends?

     answer: up to $1000
    context: rous support from the Mellon Foundation, participant
             travel stipends (up to $1000) are available to offset air
             and/or ground transportation, parking, 


  On what date will the workshop be held alongside the ACRL 2023 Conference?

     answer: March 15, 2023
    context: https://archive-it.org/blog/digital-scholarship-and-the-web/>
             held on March 15, 2023 alongside the ACRL 2023 Conference
             <https://acrl2023.us2.pathable.c


  What date will we follow up with you if your application is accepted?

     answer: February 3, 2023
    context: application is accepted, we will follow up with you no
             later than February 3, 2023. For more details, including an
              agenda, see the Event Website <ht


  What does the Event Website contain?

     answer: an agenda
    context: with you no later than February 3, 2023. For more
             details, including an agenda, see the Event Website
             <https://archive-it.org/blog/digital-scholarsh


  What do participants gain familiarity with using web archives?

     answer: web archive research use cases and how libraries support them
    context: s as a primary source, gain familiarity with web
             archive research use cases and how libraries support them; and
             acquire hands-on experience creating w


  What is a publication medium that is both a primary source and a networked container of primary sources?

     answer: the web
    context: is both a primary source and a networked container of
             primary sources, the web presents challenges of scale and
             complexity for those that seek to int


  What is required to attend the workshop?

     answer: applicants
    context: The Internet Archive <https://archive.org/> invites
             applicants to a daylong workshop Digital Scholarship and the
             Web: An Introduction to Data Analysi


  What is the acronym for Archives Research Compute Hub?

     answer: ARCH
    context: putationally analyzing web archives using Archives
             Research Compute Hub (ARCH)
             <https://webservices.archive.org/pages/arch>. Participant
             Support This


  What is the maximum amount of travel stipends?

     answer: $1000
    context: s support from the Mellon Foundation, participant
             travel stipends (up to $1000) are available to offset air
             and/or ground transportation, parking, two

  What is the priority deadline for all applications?

     answer: January 27, 2023
    context: space is limited and the priority deadline for all
             applications is January 27, 2023. If your application is
             accepted, we will follow up with you no la


  What kind of support does the Mellon Foundation provide?

     answer: generous support from the Mellon Foundation, participant travel stipends
    context: ever registration is limited, and with generous
             support from the Mellon Foundation, participant travel stipends
             (up to $1000) are available to offset 


  What type of production occurs globally?

     answer: digital information
    context: 023.us2.pathable.com/> in Pittsburgh, PA. Every day,
             significant digital information production occurs globally,
             much of it across the web (e.g., new


  What will participants learn about web archives as a primary source?

     answer: familiarity with web archive research use cases and how libraries support them
    context: archives as a primary source, gain familiarity with
             web archive research use cases and how libraries support them;
             and acquire hands-on experience cr


  Where can you send any questions?

     answer: commwebsinfo_at_archive.org.
    context: genda, see the Event Website
             <https://archive-it.org/blog/digital-scholarship-and-the-web/>.
             Please direct any questions to commwebsinfo_at_archive.org. 


  Where can you submit an application?

     answer: The Internet Archive
    context: The Internet Archive <https://archive.org/> invites
             applicants to a daylong workshop Digital Scholarship and the
             Web: An Introduction to Data Analysi


  Where is the workshop held?

     answer: Pittsburgh, PA
    context: de the ACRL 2023 Conference
            <https://acrl2023.us2.pathable.com/> in Pittsburgh, PA. Every
            day, significant digital information production occurs glob


  Who invites applicants to a daylong workshop on Digital Scholarship and the Web: An Introduction to Data Analysis and Instruction?

     answer: The Internet Archive
    context: The Internet Archive <https://archive.org/> invites
             applicants to a daylong workshop Digital Scholarship and the
             Web: An Introduction to Data Analysi


Received on Thu Jan 12 2023 - 10:11:41 EST