Re: rdf and doi's

From: Marijane White <whimar_at_nyob> Date: Tue, 15 Jan 2019 18:30:15 +0000 To: CODE4LIB_at_LISTS.CLIR.ORG

Eric,

I think loading them into a triplestore and trying to answer questions is a fine idea.  From there, you might be able to create some visualizations, if you have those skills at hand.  This also strikes me as the sort of data that could augment data in a research profiling system.

If you'd like to see an example of what others have done with harvested linked data, check out CTSASearch: http://research.icts.uiowa.edu/polyglot/

Marijane White, M.S.L.I.S.
Data Librarian, Assistant Professor
Oregon Health & Science University Library

Phone: 503.494.3484
Email: whimar_at_ohsu.edu
ORCiD: https://orcid.org/0000-0001-5059-4132

On 2019/01/15, 6:38 AM, "Code for Libraries on behalf of Eric Lease Morgan" <CODE4LIB_at_LISTS.CLIR.ORG on behalf of emorgan_at_ND.EDU> wrote:

    How might I exploit & learn from a set of RDF files harvested from DOI's?

    For a good time, I have written a suite of software to harvest bibliographic data from Web of Science, cache the results, and report on the whole. [1] Along the way I programmatically collect DOI's and then resolve them. The results include RDF streams. ("Thanks, Kevin Ford!") For example:

      curl -i -L -H "Accept: application/rdf+xml" http://dx.doi.org/10.3352/jeehp.2013.10.3

    And:

      <rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:j.0="http://purl.org/dc/terms/"
        xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:j.2="http://purl.org/ontology/bibo/"
        xmlns:j.3="http://xmlns.com/foaf/0.1/">
      <rdf:Description rdf:about="http://dx.doi.org/10.3352/jeehp.2013.10.3">
        <j.0:isPartOf>
        <j.2:Journal rdf:about="http://id.crossref.org/issn/1975-5937">
          <owl:sameAs>urn:issn:1975-5937</owl:sameAs>
          <j.0:title>Journal of Educational Evaluation for Health Professions</j.0:title>
          <j.1:issn>1975-5937</j.1:issn>
          <j.2:issn>1975-5937</j.2:issn>
        </j.2:Journal>
        </j.0:isPartOf>
        <j.0:creator>
        <j.3:Person rdf:about="http://id.crossref.org/contributor/sun-huh-112veziy3vi1o">
          <j.3:name>Sun Huh</j.3:name>
          <j.3:familyName>Huh</j.3:familyName>
          <j.3:givenName>Sun</j.3:givenName>
        </j.3:Person>
        </j.0:creator>
        <j.0:title>Revision of the instructions to authors to require... </j.0:title>
        <j.1:doi>10.3352/jeehp.2013.10.3</j.1:doi>
        <j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date"
        >2013-04-30</j.0:date>
        <owl:sameAs rdf:resource="info:doi/10.3352/jeehp.2013.10.3"/>
        <j.0:identifier>10.3352/jeehp.2013.10.3</j.0:identifier>
        <j.2:volume>10</j.2:volume>
        <j.2:pageStart>3</j.2:pageStart>
        <j.1:startingPage>3</j.1:startingPage>
        <j.0:publisher>XMLArchive</j.0:publisher>
        <owl:sameAs rdf:resource="doi:10.3352/jeehp.2013.10.3"/>
        <j.1:volume>10</j.1:volume>
        <j.2:doi>10.3352/jeehp.2013.10.3</j.2:doi>
      </rdf:Description>
      </rdf:RDF>

    That's a pretty rich RDF stream! [2]

    As of right now, I have about 8000 of these streams representing publications of faculty here at my university. I can easily get 10's of thousands more. How might I take advantage of this data? How can I go beyond parsing the RDF with XPath, stuffing the results into a database, and applying SQL to the result? How can truly exploit the nature of the RDF and possibly manifest it as linked data? 

    To answer my own question, I might put the data into a triple store, and then try to answer questions such as: what authors are central, what journals are central, what authors are "related" to whom, etc. 

    What do you think?

    [1] https://github.com/ericleasemorgan/api-taskforce

    [2] And this rich data does not even take into account the cool, sometimes full text URLs/URIs found in the HTTP link header!

    -- 
    Eric Lease Morgan
    Digital Initiatives Librarian, Navari Family Center for Digital Scholarship
    Hesburgh Libraries

    University of Notre Dame
    250E Hesburgh Library
    Notre Dame, IN 46556
    o: 574-631-8604
    e: emorgan_at_nd.edu
    w: cds.library.nd.edu