Re: Tim Berners-Lee on the Semantic Web

From: Alexander Johannesen <alexander.johannesen_at_nyob>
Date: Fri, 30 Oct 2009 16:25:38 +1100
To: NGC4LIB_at_LISTSERV.ND.EDU
Hiya,

> You used an example of http://www.un.org as an identifier for the
> organization United Nations, and you asked:
...
> What I see here is a problem of repurposing a *location* as an *identifier*.

Not quite, this is also about the lack of semantics in the way RDF
uses the notion of identity. Even with reification you can't have a
subject identity and resource identity the same, because there is only
room for one interpretation pr. URI. Let me put it differently; which
URI have the longest chance of survival, both in terms of as a
resource locator *and* as a subject identifier, and which do you trust
the most and do you think would be most accurate?

  1. http://www.un.org/    (controlled by themselves)
  2. http://id.oclc.org/org/d8445    (controlled by third-party)

This isn't simply me being argumentative or difficult (I think :) ;
there's *lots* of little subtleties involved here. Identifiers and
resources must always be judged in terms of longevity, accuracy and
maintainability. An example of this complexity is what semantics are
being applied at the time of use of the identifier. If we use the
un.org one, we must encompass the whole of UN's history in that
identifier, the ever-changing organisation. But if OCLC used theirs,
what semantics are within? The first era? The era after the cold war?
What it is now? This is about bias of identifiers.

So with that in mind, this is why I have always proposed that
libraries of the world are the people best equipped to do this job for
the rest of us ; they have a pretty good idea of the epistemological
implications I've outlined here, even though they don't always bring
that knowledge through in their systems (and I'm happy to blame
tool-makers on this one :), but surely that is something we should
strive for, no? (And I hope I'm making sense at this point)

> But I think there is a simple solution. Here are two statements:
>
>   (something) has subject (http://www.un.org)
>   (something) has subject [organization that has home page
> (http://www.un.org)]

Well, shifting the responsibility of identity down the layers of
induction isn't as helpful as you might think, even though I
understand where you're going with it. But more importantly, you're
creating ontological expressions not found in the RDF stack by
default; you must make it yourself, and try to convince the rest of
the world they should adopt this practice and ontology. (However, RDF
2.0 was just released, and I have *not* paid attention to it, so
perhaps someone can let us know if I'm still out of my tree on this
one; maybe they've fixed it)

> The first one helps us link data only if we have an agreement on what we
> will use for the identifier.

Any identifier needs to be agreed upon, even infered ones.

> However, as I believe is often the case for
> things that exist in the "real world", communities will have different
> identifiers for the organization (the LC name authority heading, a number of
> different standards for institution codes, etc.). We can't know what each
> other's identifiers are. We can, however, all know what the home page of the
> organization is, or what it calls itself in English. So, using LCNA as an
> example, we could have:
>
>   [1] (n79021345) has home page (http://www.un.org)
>       [2] (n79021345) calls itself (lang=en / United Nations)
...
> Although the latter, being a language string, is unlikely to have the
> necessary uniqueness.

I think it has been proven rather drastically that names do not, in
any shape or form, have that uniqueness. :)

> It can be used, however, as part of an inference
> decision that two triples may be referring to the same thing.

But identity from inference is what a lot of people are trying to get
away from, because it's a time-consuming, resource-gulping, imprecise
business. There is no knowledge in the identifier, only assumption
(even if the assumptions are good ones). And did I mention slow? And
prone to unpredictable results?

Again, I think I know what you're saying, and this concept is known in
my world as a topic proxy (a topic proxy is a set of topics that
together represent a subject in the real world), and is defined in the
Topic Maps Reference Model (a more abstract exercise; only approach it
if you're brave and got time to waste :). I know people in my
community which specifically have created topic proxy engines who do
all their external work in RDF and the SemWeb world, but the internals
are adopted from an abstract notion of proxies. However, the
difference is that identity is not inferred, but simply stated as a
part of some external mechanism. It was decided that the subject of
identity was too hard to solve, even on the most abstract nor
pragmatic level, inside that standard.

This leaves open to various interpretation, in which the Topic Maps
Data Model has one (Subject Locators for Subject locations, Subject
Indicators for External Identifiers, and Occurrences for resource
Locators), but that's a different post altogether. This is all just me
pointing out that things with proxy thinking isn't easy, even though I
agree it's the *correct* one, just not a very practical one.

I'll also tackle your other post here which more or less is the same
as my spiel on proxies above; you are creating a proxy, and want that
proxy to have inferred identity which is good in theory but hopeless
in practice. Also, thanks for the link, I've subscribed; good stuff.

As to records, the notion *will* break down. Each property will have
greater value on its own, and we will create ontological proxies to
bind their context together. We're already seeing this debate in the
RDFa world, where the URI of the page the RDF statements are in, and
what significance we should apply to that. It seems, rather
annoyingly, that nothing can stand on its own two feet. :)


Regards,

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ ----------------------------------------------
------------------ http://www.google.com/profiles/alexander.johannesen ---
Received on Fri Oct 30 2009 - 01:27:08 EDT