Re: Aggregation of metadata

From: Diane I. Hillmann <dih1_at_nyob> Date: Tue, 16 Feb 2010 14:49:42 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

On 2/16/10 4:57 AM, Till Kinstler wrote:
>
> The whole data aggregation workflow really is a pain. Much effort goes
> into analyzation and conversion of all the funny things you get from
> publishers (few deliver bibliographic record formats like MARC21, but
> most provide some homebrewn XMLish formats, Excel sheets, formatted
> text files... worst thing was a >10000 pages Word document containing
> "records" as formatted paragraphs, weird...).
>
I think it's pretty common to underestimate the effort, skill and time 
it takes to aggregate disparate batches of metadata into a cohesive 
whole.  The challenges go far beyond the mapping problems (I, too, have 
parsed Word documents into metadata records!), and doing it well really 
requires some fairly careful data management strategies to be cost 
effective over time.  When I was working with the National Science 
Digital Library (NSDL) project some years ago, we spent quite a bit of 
time working on these challenges and published a clutch of papers about 
our work, which may be of help for those of you working in the 
aggregation area.  They can be found on the DCMI papers site 
(http://dcpapers.dublincore.org/ojs/pubs) with a search under author 
name "Hillmann" or on the our consulting site: 
http://managemetadata.org/ under the "about" tab.

Diane Hillmann