One Data Format Identifier (and Registry) to Rule Them All

From: Ross Singer <rossfsinger_at_nyob>
Date: Thu, 30 Apr 2009 14:59:31 -0400
To: CODE4LIB_at_LISTSERV.ND.EDU
Hello everybody.  I apologize for the crossposting, but this is an
area that could (potentially) affect every one of these groups.  I
realize that not everybody will be able to respond to all lists,
but...

First of all, some back story (Code4Lib subscribers can probably skip ahead):

Jangle [1] requires URIs to explicitly declare the format of the data
it is transporting (binary marc, marcxml, vcard, DLF
simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
own URI structure for this (http://jangle.org/vocab/formats#...) but
this was always been with the intention of moving out of the
jangle.org into a more "generic" space so it could be used by other
initiatives.

This same concept came up in UnAPI [2] (I think this thread:
http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682
discusses it a bit - there is a reference there that it maybe had come
up before) although was rejected ultimately in favor of an (optional)
approach more in line with how OAI-PMH disambiguates metadata formats.
 That being said, this page used to try to set sort of convention
around the UnAPI formats:
http://unapi.stikipad.com/unapi/show/existing+formats
But it's now just a squatter page.

Jakob Voss pointed out that SRU has a schema registry and that it
would make sense to coordinate with this rather than mint new URIs for
things that have already been defined there:
http://www.loc.gov/standards/sru/resources/schemas.html

This, of course, made a lot of sense.  It also made me realize that
OpenURL *also* has a registry of metadata formats:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Metadata+Formats

The problem here is that OpenURL and SRW are using different info URIs
to describe the same things:

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

The latter technically isn't the same thing since the OpenURL one
claims it's an identifier for ONIX 2.1, but if I wasn't sending this
email now, eventually SRU would have registered
info:srw/schema/1/onix-v2.1

There are several other examples, as well (MODS, ISO20775, etc.) and
it's not a stretch to envision more in the future.

So there are a couple of questions here.

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"
3) be so lax that it throws all hope of authority out the window
?

I would expect the various communities to still maintain their own
registries of "approved" data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).

Does something like this interest any of you?  Is there value in such
an initiative?

Thanks,
-Ross.
Received on Thu Apr 30 2009 - 15:01:40 EDT