Re: FRBR WEMI and identifiers

From: Ross Singer <rossfsinger_at_nyob> Date: Fri, 13 Nov 2009 11:07:53 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

On Fri, Nov 13, 2009 at 3:51 AM, Weinheimer Jim <j.weinheimer_at_aur.edu> wrote:

> Please correct me if I am wrong, but if all we can do is provide 19th century browse displays of our headings--and headings are the vast majority of the control we exert over the records we create and the materials in our collections--we can do that right now and it is met by incomprehension and uncaring on the part of the public with the result that they ignore our tools whenever they can. I don't see that changing now.
>
> Is there a way out of this? Or should we start working a lot more with dbpedia?

I think we need to back up.  Way up.

Ok.  What id.loc.gov/authorities gives us are identifiers for
authorized subject headings.  The easiest way to look at this would be
to compare it to a doi:  info:doi/10.1109/MIS.2006.62.  So when we use
that identifier, we are unambiguously referring to that article.  We
don't need to worry title capitalization/punctuation, whether or not
the author strings are first last or last, first, etc.  We have a
thing, we agree on what that thing is (thanks to our identifier), now
we can began to talk about said thing.

So the same concept applies to:
http://id.loc.gov/authorities/sh85120839#concept.  This URI stands for
the subject heading we use to describe a work that is about "theories
derived from a school of thought that originated at Oxford University
about the actual authorship of the writings of one William
Shakespeare" (or something like that).  That is all that
"http://id.loc.gov/authorities/sh85120839#concept" is intended to
accomplish.

Of course, that in and of itself isn't terribly useful.  But now we
have our identifier to begin making assertions against.  For
discoverability, we need some text for queries to match and for the
concept being identified to be recognizable to humans.  By convention,
we concatenate our subject fields with "--" or ", ", depending on the
subfield, so LC applied this convention to the prefLabels and
altLabels.  For one thing, it's recognizable, for another, since it's
follows convention, it makes it much easier to match our existing 6xx
fields to their corresponding skos:Concept uri.

However, all they are are labels to help describe the resource
"http://id.loc.gov/authorities/sh85120839#concept".  They are not
intended to act as identifiers themselves.  Worrying that we have just
another text string is a red herring, it's just a display label and
has nothing to do with the way
http://id.loc.gov/authorities/sh85120839#concept relates (or does not
relate) to other concepts.

One of your concerns was that "RDF can't provide these relationships"
-- this isn't true.  RDF can relate any two resources together:
basically anything that can be represented as "subject predicate
object" can be defined in RDF (subject is the URI of the resource, in
this case "http://id.loc.gov/authorities/sh85120839#concept",
predicate is the URI of the relationship and object is a uri or a
literal).

This flexibility, however, has its limitations.  Just because you can
link any two resources together  doesn't mean anybody is going to
understand what your relationship means.  So, like our example with
the prefLabel, practitioners of RDF tend to rely on conventions (which
is to say, vocabularies that people actually use) and avoid making up
their own predicates since they will be devoid of meaning outside
anything but local implementations.

Which gets us to coordination.  LC chose to model LCSH primarily in
SKOS, because that's the conventional way in the semantic web to
represent thesauri.  The "S" in SKOS, however, stands for "simple": it
deals with concepts, narrower, broader, and related relationships (in
a nutshell), but coordination is outside of its current scope.  There
is currently talk about incorporating coordination into SKOS, whether
as a revision to the core or as a extension, but there's no current,
convention for representing these sorts of relationships.

As I mentioned earlier in this thread, Ed Summers proposed a simple
solution to provide a rough representation of coordination:  roughly
that the concept is the union of the coordinated concepts that it's
composed of.  The only people who replied were me (who abstained out
of ignorance on the particulars of concept coordination), Karen Coyle,
Dan Matei, Ed Jones and Jonathan Rochkind.  The thread then diverged
into a discussion of the merits of FAST.

Your particular example would be problematic to model anyway.

What we have is the coordination of the personal name concept for
William Shakespeare with the general subdivisions for authorships and
Oxford theory.

The immediate problem is that the NAF isn't currently in id.loc.gov,
so there's no "William Shakespeare" concept to relate to.
id.loc.gov/authorities could pull this from viaf.org right now, but
that presents other complications.  "Oxford theory" isn't an
authorized heading, either, so its relationship is going to have to be
dealt with in some capacity as well (although it's not immediately
clear how this would work).

Then there's a secondary issue, as well.  While we're coordinating the
concepts for William Shakespeare and Authorship (and adding a new
layer), there's a concept for the coordination of William Shakespeare
and Authorship, as well:
http://id.loc.gov/authorities/sh85120833#concept.  What is this our
relationship to that?

So, in order to address coordination in LCSH we need:

a new resource to represent the coordination
predicates to define the relationships between our concept and the top
concept and subdivision concepts
predicates to define the relationship between our resource to the coordination
a predicate to define the relationship (whatever it may be) between
our resource and http://id.loc.gov/authorities/sh85120833#concept

with no prior art to draw upon.

LC, seeing that this could take forever, chose to release
id.loc.gov/authorities based on a faithful mapping of SKOS to the
explicit relationships in the MARC.

The important thing to keep in mind, however, is that this established
our identifiers.  With linked data, every conceivable problem does not
need to be solved prior to making it available.  In fact, it's the
whole point of linked data.  Because we have the uris available to
identify the concept resources, you or I or anybody in the world:
inside libraries or out, can make assertions upon them and others can
confidently take these declarations and know exactly what is being
talked about.

I started lcsubjects.org based on the skos data so I can say:
http://lcsubjects.org/subjects/sh85016186#concept
[umbel:linksEntity]
http://dbpedia.org/resource/Boxing

or
http://lcsubjects.org/subjects/sh85010942#concept
[wgs84:location]
http://sws.geonames.org/3831554/

And you can safely use them (assuming you trust my assertions) because:
http://lcsubjects.org/subjects/sh85016186#concept
[skos:exactMatch]
http://id.loc.gov/authorities/sh85016186#concept

and

http://lcsubjects.org/subjects/sh85010942#concept
[skos:exactMatch]
http://id.loc.gov/authorities/sh85010942#concept

But this stuff can't happen overnight.  It takes time:  we're talking
about large datasets being matched against large datasets with no real
datapoints to match to except strings.

-Ross.