Re: Special OAIster Announcement from OCLC

From: James Weinheimer <j.weinheimer_at_nyob>
Date: Wed, 23 Sep 2009 03:11:27 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
So if I may ask what may be a silly question, where exactly is the expense
in harvesting? Certainly you need a decent computer system and spiders, but
according to the message at:http://hangingtogether.org/?p=738

"Harvesting is hard. As anyone who has done this work will tell you,
harvesting records using the Open Archives Initiative Protocol for Metadata
Harvesting (OAI-PMH) is far from simple. There are all kinds of
difficulties, not the least of which is the uneven support of the protocol
by the wide variety of repository platforms. Community awareness of these
problems led to the formation of an NSDL and DLF-sponsored working group
that produced a web site devoted to “Best Practices for OAI Data Provider
Implementations and Shareable Metadata”. Since this is a difficult process,
we may not get everything right from the beginning, but with help from the
University of Michigan during this transition we’re hopeful that we can not
only reach, but eventually exceed, what has gone before."

and

"We are seeking to provide long-term scalability for this service and we ask
for the cooperation of data providers. Something that is likely not widely
known is that the University of Michigan would perform specialized
processing of the retrieved records because of standards noncompliance by
some data providers. In order to sustain this service over the long haul, we
will need to work with data providers to reduce the number of exceptions to
standard procedures."

I did some of this in a previous job and yes, getting decent OAI-PMH is
difficult, and lots of corrections are needed until records can be
processed, but this should be in the initial loads until the problems are
corrected. Where I was, we focused only on getting valid OAI-PMH coding and
not the content. If we had had to check for standard forms of name and so
on, it would never get done. Was Michigan updating at that level and
expecting everyone to follow "content" or "semantic" standards as well?

While I believe in high standards, then it seems to me that solutions must
be sought on the side of those who create the records and not those who
receive them. Otherwise, the constant updating of inferior records will be
unsustainable at the receiving end. If the entire affair proves to be too
difficult, as it apparently was for Google when they went with the
*sitemaps* (I hope I am correct this time!), then the standards themselves
must change to become something that others can follow. If we would mandate
that all automobiles sold must be able to get 300 miles per gallon, that may
be well and good, but nobody would have an automobile. (Legally, anyway)
Standards should be high, but also be achievable.

Jim Weinheimer
Received on Wed Sep 23 2009 - 03:12:49 EDT