Comments on Martha Yee's article about RDF

From: James Weinheimer <j.weinheimer_at_nyob> Date: Tue, 14 Jul 2009 10:52:07 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

All,

Here are some of my thoughts about Martha Yee's article on RDF.

This is a short discussion of "Yee, Martha M. "Can Bibliographic Data be Put
Directly onto the Semantic Web?." Information Technology & Libraries 28, no.
2 (June 2009): 55-80."

Martha Yee'a article focuses on her efforts to map current bibliographic
entities and attributes into an RDF structure, and while I find Martha's
efforts very interesting and I appreciate all the work and thought, it seems
to me a bit out of place since the main purpose of RDF is to share
information, most especially bits of information. Although RDF probably can
be utilized for storage and display, it is not necessarily designed for
those purposes. Very probably, records collected through RDF will be stored
in relational databases or in XML. This is similar to how the current MARC
format is used, which has become obsolete as a storage medium, and the
records are stored mostly in relational databases. The MARC format has come
to be used only as a communications format, i.e. transferring records from
one library to another.

My own approach to the practical uses of RDF is completely different than
Martha's. To me, librarians should not approach RDF as simply a "new and
improved version" of MARC format. I prefer to think of it in terms of: What
are the new things it can do? Even though RDF is designed primarily for
sharing (i.e. transferring information) does it allow us to introduce
possibilities that have been beyond our imaginations? I believe it does and
if we embrace it, it could be that RDF will completely revolutionize how we
do our work.

With RDF, you can take bits and pieces of information from all over the
place and remix them for your needs, so it is completely different from
transferring a MARC record using Z39.50. But what does this mean and what
are the practical consequences? Theoretically, you could take information
from all kinds of specialized places, e.g. information about the individual
resource from the publisher, information about the creator from a name
server, information about subjects from a subject server, information about
citations from somewhere else, discussions and reviews from other places,
and all of this can come together on your screen. 

So for me, the main lesson from my understanding of RDF is that the
bibliographic record, along with the bibliographic universe itself, is
fragmenting. The purpose of implementing RDF should not be to do the same
things we do today only with different tools, or simply to implement some
kind of FRBR structure, but to grasp the new capabilities to build something
that will be much better for both ourselves and for our patrons, while we
can do everything far more efficiently. Therefore, with the introduction of
RDF, an individual record will resemble more and more a web mashup. While
this could be done in real-time, it probably won't be instituted completely
for the forseeable future because the Internet is still too inefficient (as
Martha Yee points out on p. 65). Still, automatic updates to the metadata
could be enabled relatively simply, plus it is important to realize that
some of this technology is in place right now and is utilized in library
catalogs, with the Google Book Search API, Amazon Web Services, and so on.

In such a fragmented scenario, responsibility for each part of the record
could be assumed by different communities. For example, let's imagine that a
French organization were to become responsible for the personal name server.
What would such a tool look like? Possibly much like the VIAF with all kinds
of forms and references (http://viaf.org/). It could collect information in
a whole number of ways, certainly from metadata creators, but also from the
personal web pages of authors. Using an RDF-enabled system (such as Drupal),
an author could update his or her own page, the French name server could
take the updates automatically, and since everyone in the world would have
the URI to the French name server, all could benefit from the author's work.
Similar tools could be built for the publication information which could
become the responsibility of publishers, and they could share their
publication information so that all could benefit. The same could be done
for corporate names, subjects, and other parts, or entirely new parts, of
the metadata record. 

As innovative tools are built that can take advantage of the power of URIs,
it becomes difficult even to imagine the possibilities in the future. In
theory, you wouldn't need a database at all, since all the information could
be stored on different machines around the world, and with federated
searching, you would just need collections of URIs brought together
virtually on your computer screen.

This would be one example of how libraries and the entire metadata community
could truly cooperate and in this way, everyone would get the help they
sorely need, but I am sure that for many, such a scenario is bizarre and
horrifying. Not only does the unit record completely fall away, but more
ominously, what role would the librarian have in such a world? This would
mean opening things up and thereby relinquishing control over some very
sensitive areas that have always been the domain of librarians. An old-style
librarian may indeed have very little to do, but such a system would still
demand librarian-type skills to ensure that the data stays true to an
acceptable level of quality, for general management, and--Google's protests
to the contrary--I personally do not believe that knowledge will ever
organize itself, no matter how hair-raising their algorithms may become.
Finally, the information universe can only become more complex and anyone
wanting to enter it will always need help. 

Therefore, I don't think there is any reason for librarians to fear for
their existence in such a future. Setting it up will be quite demanding (how
about getting agreement on standards?!), maintaining it even more so. There
will be plenty for everyone to do.

(For Karen Coyle's comments on Martha Yee's article, quite different from
mine, see her blog at:
http://kcoyle.blogspot.com/2009/07/yee-on-rdf-and-bibliographic-data.html)

Jim Weinheimer