On 2/16/10 4:57 AM, Till Kinstler wrote:
>
> The whole data aggregation workflow really is a pain. Much effort goes
> into analyzation and conversion of all the funny things you get from
> publishers (few deliver bibliographic record formats like MARC21, but
> most provide some homebrewn XMLish formats, Excel sheets, formatted
> text files... worst thing was a >10000 pages Word document containing
> "records" as formatted paragraphs, weird...).
>
I think it's pretty common to underestimate the effort, skill and time
it takes to aggregate disparate batches of metadata into a cohesive
whole. The challenges go far beyond the mapping problems (I, too, have
parsed Word documents into metadata records!), and doing it well really
requires some fairly careful data management strategies to be cost
effective over time. When I was working with the National Science
Digital Library (NSDL) project some years ago, we spent quite a bit of
time working on these challenges and published a clutch of papers about
our work, which may be of help for those of you working in the
aggregation area. They can be found on the DCMI papers site
(http://dcpapers.dublincore.org/ojs/pubs) with a search under author
name "Hillmann" or on the our consulting site:
http://managemetadata.org/ under the "about" tab.
Diane Hillmann
Received on Tue Feb 16 2010 - 14:50:10 EST