Re: Tim Berners-Lee on the Semantic Web

From: James Weinheimer <j.weinheimer_at_nyob> Date: Wed, 21 Oct 2009 03:39:39 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

On Tue, 20 Oct 2009 11:16:37 -0400, Jonathan Rochkind <rochkind_at_JHU.EDU> wrote:

>I am confused by people saying they find it a 'minus' not a 'plus' that
>RDA is "based on FRBR".  Let's be clear -- the 'based on FRBR' that RDA
>is based on the _conceptual domain model_ outlined in FRBR. For better
>or worse, the actual 'user tasks' in FRBR are just a sideshow, RDA isn't
>really 'based' on them, and the FRBR domain model itself isn't really
>'based' on them. It's a fiction.  I am happier just ignoring the user
>tasks (regardless of how useful they are), because it's the domain model
>FRBR provides that is useful.

It seems to me that it is a basic axiom that an organization should ensure
that what they are going to create will be useful to the public before they
embark on some huge project to produce it. This is outside any "ethereal
considerations" of whether something is "best" or not. For example, someone
might come up with the idea of making the very best typewriter the world has
ever seen. It puts all the others to shame. The result? It might have been a
great idea 20 years ago, but today it doesn't matter how good the typewriter
is because people have moved on. They will not use typewriters anymore, no
matter how improved they may be. I am concerned that with RDA, we are
building a new, much improved typewriter.

Therefore, we could continue with FRBR and force the information world into
the 19th-century WEMI model (as Bernhard has demonstrated in his postings)
simply because we don't have anything better, but will anyone find it useful
besides librarians? This is the reason for the testing, which unfortunately
should have been done long ago. While we could build these dubious models
for people to  take or leave if they wish (i.e. the traditional library
model), it seems to me that we ignore user tasks (i.e. utility to the users)
to our own peril. That path is much too dangerous.

>Simply expanding on AACR2 as rules for creating text is exactly the
>wrong approach. We don't need rules for creating text, we need rules for
>creating data elements in a defined domain model.  That defined domain
>model is what allows catalogers or metadata creators to use their
>"cataloger's judgement", understanding what they're doing. And is what
>defines data consumers to understand what the data they've got means
>without having to be catalogers, and to pull the data elements they want
>out of the data understanding what those data elements are intended to
>mean.  It's what allows us to create data that is flexible and will be
>useful in the future even for use cases we didn't think of previously,
>because we know what sort of data we were trying to create. Without such
>a formal domain model, you're just creating 'text', not 'data'.  Which
>is indeed what we used to do, when the text was destined for printed
>cards or pages.  But now we need data.

While this is true, where does the idea of "standards" fit in? By this, I
mean superior data or inferior data, and at some level, this devolves to
superior or inferior "text" e.g. 1st edition. Even using linked data, there
is still text involved at some level, e.g. all the text available through
http://viaf.org/viaf/12307511. Therefore, there is some text that is
standard (i.e. high-quality), and some that is not. If there are standards,
then there need to be rules, and that is where I see that AACR2/RDA comes in.

Quality was discussed at some length in the Language Log posting and I
believe even in the Chronicle, so it is considered important by the general
public. This is a tremendous opportunity for us, since we understand
"high-quality metadata standards" as no one else. As I pointed out in my
CCRW announcement, a reconsideration of what "high standards" means is in
order because different metadata records following different standards get
mashed together, as in a Google Books "overview."

>So if the FRBR data model isn't good (enough), you can expand on it --
>or you can even abandon it entirely and create a new one. But it should
>be noted that also, for better or worse, the FRBR domain model was
>intended to be compatible with our legacy data and practices -- it is a
>formalization of cataloging tradition.   

This is a very good point, but we have been waiting for a long time to see
how our records will fit into this new information world, and it has turned
out that they don't (except relatively recently in Google Books, with poor
results). I can't get the talk with Berners-Lee out of my mind, where he
says to put up your data however you can because people will change it for
their purposes, no matter what you do. This should be considered a good
thing. I think he may be correct and it is causing a change in my thinking.

It seems to me much more important to enter the information world with
something less than perfect than to enter it too late. It would be very
interesting to see a major library put up their records for public download
and manipulation using some kind of simplified format, such as qualified DC
(full of that "text") and see what would happen as people worked with it.
The full RDF formats could come later. I think the public understands the
idea of improving a product and have come to expect it.

I'll bet if a library did that and let it be known, it would be widely
popular. Of course, of even more popularity would be if it contained records
for online publications that the user could access and a type of Infomine or
Intute could be used.

But one step at a time.

Jim Weinheimer