Re: Discussion of id.loc.gov

From: Tom Keays <tomkeays.lists_at_nyob>
Date: Tue, 19 May 2009 16:39:13 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
Hi Karen,

Very interesting post.

Let's look at Italy--History--1492-1559--Fiction
<http://id.loc.gov/authorities/sh2008115565#concept>

The SKOS authority record has 2 broader Terms:

 * Italy--History <http://id.loc.gov/authorities/sh85068936#concept>
 * Italy--History--1492-1559 <http://id.loc.gov/authorities/sh85068950#concept>

Looking then at Italy--History
<http://id.loc.gov/authorities/sh85068936#concept>

We see that it has no broader or narrower terms, the former because of
the situation you describe with the top term, "Italy", and the latter
because id.loc.gov/authorities doesn't track the subdivisions back
down.

Similarly, for Italy--History--1492-1559
<http://id.loc.gov/authorities/sh85068950#concept>

Broader Terms:

 * Italy--History <http://id.loc.gov/authorities/sh85068936#concept>

Narrower Terms:

 * Ceresole, Battle of, Ceresole Alba, Italy, 1544
<http://id.loc.gov/authorities/sh85022136#concept>
 * Marignano, Battle of, Melegnano, Italy, 1515
<http://id.loc.gov/authorities/sh85083407#concept>
 * Novara, Battle of, Novara, Italy, 1513
<http://id.loc.gov/authorities/sh85092857#concept>
 * Pavia, Battle of, Pavia, Italy, 1525
<http://id.loc.gov/authorities/sh85098875#concept>
 * Ravenna, Battle of, Ravenna, Italy, 1512
<http://id.loc.gov/authorities/sh85111575#concept>
 * Scannagallo, Battle of, Italy, 1554
<http://id.loc.gov/authorities/sh95000332#concept>
 * Vicenza, Battle of, Vicenza, Italy, 1513
<http://id.loc.gov/authorities/sh2003004382#concept>

But, again, there are no narrower terms corresponding to the finer
subdivisions that can be made on Italy--History--1492-1559.


Contrast LCSH to truly hierarchical ontologies. From the realm of
biochemistry, there is the tidy example of Chemical Entities of
Biological Interest (ChEBI), a "freely available dictionary of
molecular entities focused on ‘small’ chemical compounds."

Take the example of L-glutamic acid residue (CHEBI:29972)
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:29972>

Corresponding to broader terms, there are the "Outgoing" URIs:

 * L-glutamic acid residue (CHEBI:29972) is enantiomer of D-glutamic
acid residue (CHEBI:48096)
 * L-glutamic acid residue (CHEBI:29972) is conjugate acid of
L-glutamate residue (CHEBI:29973)
 * L-glutamic acid residue (CHEBI:29972) is a canonical amino-acid
residue (CHEBI:33700)
 * L-glutamic acid residue (CHEBI:29972) is a glutamic acid residue
(CHEBI:32483)
 * L-glutamic acid residue (CHEBI:29972) is substituent group from
L-glutamic acid

and "Incoming" URIs, which correspond to narrower terms:

 * D-glutamic acid residue (CHEBI:48096) is enantiomer of L-glutamic
acid residue (CHEBI:29972)
 * L-glutamate residue (CHEBI:29973) is conjugate base of L-glutamic
acid residue (CHEBI:29972)

and, finally, the "Tree view" which exposes the entire hierarchy above
the term:

* CHEBI:24431 molecular structure
 * CHEBI:23367 molecular entity
  * CHEBI:33579 main group molecular entity
   * CHEBI:33675 p-block molecular entities
    * CHEBI:33582 carbon group molecular entities
     * CHEBI:50860 organic molecular entity
      * CHEBI:33285 heteroorganic entities
       * CHEBI:36962 organochalcogen compounds
        * CHEBI:36963 organooxygen compounds
         * CHEBI:36586 carbonyl compound
          * CHEBI:33575 carboxylic acid
           * CHEBI:33709 amino acids
            * CHEBI:33704 .alpha.-amino acids
             * CHEBI:18237 glutamic acid
              * CHEBI:16015 L-glutamic acid
               * CHEBI:29972 L-glutamic acid residue	
             * CHEBI:15705 L-.alpha.-amino acids
              * CHEBI:16015 L-glutamic acid
               * CHEBI:29972 L-glutamic acid residue

You can stop off at any point in the hierarchy, say, carboxylic acid
(CHEBI:33575) <http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:33575>
and you will find, among the list of incoming URIs, the next link in
the chain:

 * amino acids (CHEBI:33709) is a  carboxylic acid (CHEBI:33575)

and so on down the line back to L-glutamate residue (CHEBI:29973).

L-glutamate residue (CHEBI:29973) is, itself, the end of the line
because the two incoming URIs

 * D-glutamic acid residue (CHEBI:48096) is enantiomer of L-glutamic
acid residue (CHEBI:29972)
 * L-glutamate residue (CHEBI:29973) is conjugate base of L-glutamic
acid residue (CHEBI:29972)

have reciprocal outgoing URIs

 * L-glutamic acid residue (CHEBI:29972) is enantiomer of D-glutamic
acid residue (CHEBI:48096)
 * L-glutamic acid residue (CHEBI:29972) is conjugate acid of
L-glutamate residue (CHEBI:29973)


As Mary Dykstra [1] described, the underlying problem is simply this:
LCSH is not a true thesaurus (although, with "narrower," "broader,"
and "related" terms, it masquerades as one).

At best, then, SKOS can only construct an approximation of a hierarchy
unless it finds a way to track the myriad possible subheadings that
can be applied at every step of the way.

Tom

--------------
[1]  Dykstra, M. (1988). "LC Subject Headings Disguised as a
Thesaurus." Library Journal 113(4): 42-46.


On Tue, May 19, 2009 at 10:56 AM, Karen Coyle <lists_at_kcoyle.net> wrote:
> OK, I did a blog post at:
> http://kcoyle.blogspot.com/2009/05/lcsh-as-linked-data-beyond-dash-dash.html
>
> Here's a cut and paste:
>
> The SKOS version of LCSH <http://id.loc.gov/authorities/> developed by LC
> has made some choices in how LCSH would be presented in a linked-data
> format. One of these choices is that the complex headings (which is the vast
> majority of them) are treated as a single string:
>
>   Italy--History--1492-1559--Fiction
>
>
> While this might fit appropriately as a SKOS vocabulary, in my opinion it
> does not work as linked data. I'm going to try to explain why, although it's
> quite complex. Part of that complexity is that LCSH is itself complex,
> primarly because there are many exceptions to any pattern that you might
> care to describe. (For more on this, I suggest Lois Mai Chan's Library of
> Congress Subject Headings, 4th edition, the chapter on geographic subject
> headings, pp. 67-89)
>
> Taking the heading above, as I mentioned in my previous post, the geographic
> term Italy is not in LCSH even though it can indeed be used as a subject
> heading. Instead, Italy is defined as a name heading in the LC name
> authorities file. In that file, and only in the name file, alternate forms
> of the name are included (altLabels, in SKOS terminology):
>
>   451 __ |a Repubblica italiana (1946- )
>   451 __ |a Italian Republic (1946- )
>   451 __ |a Wlochy
>   451 __ |a Regno d’Italia (1861-1946)
>   451 __ |a It?alyah
>   451 __ |a Italia
>   451 __ |a Italie
>   451 __ |a Italien
>   451 __ |a Italii?a?
>   451 __ |a Kgl. Italienische Regierung
>   451 __ |a Ko¨nigliche Italienische Regierung
>
>
> There are no altLabels in the LCSH entry for Italy--etc. And because the
> term Italy is buried in an undifferentiated string, there is no linked data
> way to say that the Italy in Italy--History--1492-1559--Fiction is the same
> as http://id.loc.gov/authorities/n79021783, which will presumably be the URI
> for the name.
>
> It is assumed in LC authorities that the altLabels for a name term that
> appears in a subject heading apply to both the name used as a name and the
> name used as a subject heading. In the card catalog, where the name alone
> would appear first in the alphabetical browse of the cards, it was only
> necessary to make references to that "head" of the list, which would, in our
> case, be Italy alone. This has caused great problems in online catalogs
> where searching is by keyword, not a linear alphabetical search. Some
> systems manage to get around this by doing a string compare to the same
> subfields in name headings and subject headings, and then transferring the
> altLabel forms to the related subject headings.
>
>   $a Shakespeare, William, $d 1564-1616
>   $a Shakespeare, William, $d 1564-1616 $v Adaptations $v Periodicals
>
> In this case, the $a and $d subfields represent the same authoritative
> entity. The rules say that they are, and must be, the same authoritative
> entity. If they don't match exactly then someone has done something wrong.
> They are both instances of a name identified as "n 78095332", and which will
> presumably be given the URI http://id.loc.gov/authorities/n78095332. There
> is no question about that.
>
> There is also no question that when the name is used in a subject heading it
> has the full meaning that it is given in the name heading record, including
> alternate forms of the name and the many notes fields provided by the
> catalogers that created the authority record. That this don't appear in the
> LCSH file does not mean that it is not the case: it means only that the LCSH
> record assumes that the name record exists and provides that information,
> and that the information is applied to the name in the subject entry through
> the linear nature of the dictionary catalog.
>
> We musn't confuse the form with the meaning. That LCSH has a rather arrested
> form is unfortunate, but it was never intended to be used outside of the
> context of the full set of authorities that gives full treatment to those
> things that have "proper names." (c.f. Chan, chapter 4)
>
> If we wish for the LC authorities to be used in a linked data environment,
> then we have to make sure that the linking capabilities are there. Although
> I agree that each LCSH record has an identifier, and that identifier should
> be used, I don't agree that what is expressed in the LCSH record is a dumb,
> undifferentiated string. In this post I have addressed the relation to name
> headings, but there are other uses of controlled vocabularies within the
> subject headings that I haven't fully investigated yet.
>
>
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>
Received on Tue May 19 2009 - 16:41:11 EDT