Re: Tim Berners-Lee on the Semantic Web

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Sat, 24 Oct 2009 08:30:16 +1100 To: NGC4LIB_at_LISTSERV.ND.EDU

On Fri, Oct 23, 2009 at 05:53, Karen Coyle <lists_at_kcoyle.net> wrote:
> great idea, but we're talking about sharing the data, so somehow we have to
> share the URIs. [...] In order to share and link data we have to share
> and understand identities.

Yes, very spot on.

I've been thinking about this for the last day or two, and so instead
of my usual whinging I'll try to outline something actually useful.

As I mentioned in an earlier post, most of this comes down to
representialism of basic epistemology, on the ways we agree to
identity of things. It is so often said in the library world that
identity management is done through things like MARC authority
records, but few seem to be willing to peek inside them and dissect
what's holding them together. And the truth inside is that it's just
another text-based database with a few non-directional record
identifiers that may or may not be known. In other words, there's not
much in there in terms of identity apart from the agreed-upon culture
that surrounds it (especially the culture of AACR2 / MARC), ie. the
librarians themselves.

* A side note for me in this is that FRBR hasn't got identity control
(but left it to another sister model?), nor does it contain important
entities (in fact, they may be some of the *most* important parts!)
like the authors and publishers, which I find, quite frankly, rather
bizarre as they are *key*, but hey.

So, let's proceed with something I've written about here many times
over, but which is important enough to repeat ;

What is identity management? Well, in our case it is the basic
question of epistemology, really; "How do we know what we know?", or,
how do we know that book A was written by author B under the pseudo
name C, published by D in E,F and G editions in country H and I, and
that author J made a film script based on edition F and used the
author B's real name? How do we model this? What does it even mean
that something *has* an identity? And how can we *know* that book A
and book Z isn't the same book? How do they relate?

To find out any of these questions (a lot of this is FRBR and RDA
domain specific, of course), we need to work out the one thing that
binds them all together ; identity.

To establish identity, we all must agree on that particular identity,
so the first thing is to agree on a way to share and use these
identities, and the technology for this is the obvious URI. But hang
on, the RDF / Semantic Web crowd has this same problem, and it is
still very much unresolved (there's a hack popular these days using
HTTP caching and headers involved to figure out if the identity is for
the subject or for the URI itself ... tricky business, dirty hack) due
to the following ;

In the RDF world, the ultimate resource is the URI itself, and they
are used to identify as well. But if I say that a subject is about
"http://www.un.org/", is my subject the UN as an organisation, or
their website as a whole, or the page that HTTP returns? In the Topic
Maps world we solve this by having two distinct (direct and indirect
identity) and a third subtle way of handling identity management. At
this point, I'd like to point you to ;
   http://www.ontopia.net/topicmaps/materials/identitycrisis.html

Next, we need to communicate and share the identities. For this to
happen on a larger scale, you need trust in two ways, internal and
external. The internal trust is trust between peers of a (human)
network who's responsible for its infra structure, and the external
trust is how outside parties trust that network. Libraries are in this
in a rather unique position of having *high* trust in both camps, a
position which I personally see as key to its success but also as an
amazing opportunity to save the world [TM] and do the right thing. The
fact is, the world truly needs this, and the library world truly needs
to exploit / take the opportunity of being able to do so.

So take something like Conal Tuohy's “Entity AuthorityTool Set” (EATS)
project (at the New Zealand Digital Library) in which you define
subjects and locators (uh, working from memory, so Conal needs to
correct my errors), tweak it so that it works distributed, and team up
with, use, or otherwise talk to Kal and Graham about their
http://www.subj3ct.com which distributes and merges PSI (public
subject indicators). Through these two simple (well, relatively
speaking) tools the library world could create a whole world of
identifiers managed through the diligence and trust of librarians, let
the mechanics of internal trust work out how the external trust
factors are to be played, and give the world the most important piece
of knowledge management *ever*. And it wouldn't take more resources
than we already are sacrificing to crappy tools and at the altar of
MARC.

What is needed is serious identity management, and what organisation
is better suited than what's already trusted for being neutral, work
in everybody else's favor, and who also understand the perils of
traditional record keeping while having the epistemological
understanding? Sorry, but I don't see any other organisation in the
world being capable of pulling this off, so why not give the world the
greatest gift of technology right now which will be uniquely yours;
yourselves?

The good part about this is that it doesn't matter if we're talking
FRBR, MARC, RDF / Semantic World, Topic Maps or anything else; these
are just URIs that they all *depend* upon. There is no reason not to
do this, and not do it right.

Anyway, shootin' from the hip and now off to wash the floors and clean my mind.

Regards,

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ ----------------------------------------------
------------------ http://www.google.com/profiles/alexander.johannesen ---