Re: Aggregation of metadata

From: Alejandro Garza Gonzalez <alejandro.garza_at_nyob> Date: Fri, 19 Feb 2010 17:04:56 -0600 To: NGC4LIB_at_LISTSERV.ND.EDU

I kind of thought that "aggregating metadata" kind of meant enriching an 
existing description (say, adding cover images, tables of contents, 
links to the author's picture, etc. to a basic MARC record) using 
external sources instead of adding more descriptions of more (different) 
things.

This posed (to me) the additional problem of deciding what the best 
source for the same kind of enrichment data comes from (say, you found 
tables of contents in 3 places... which do you want?) as well as 
converting each data to something *usable* =)

_alejandro

Eric Lease Morgan said the following on 16/02/2010 01:26 p.m.:
> Marja Haapalainen wrote:
>
>    
>> ...We are currently looking at different ways of aggregating
>> metadata (mainly about e-articles, e-journals and e-books) from
>> different sources/publishers...
>>      
>
> Till Kinstler wrote:
>
>    
>> The whole data aggregation workflow really is a pain....
>>      
>
> I think the idea of aggregating metadata is/was long time in coming. Kudos. And yes, one of the bigger challenges, besides the licensing/purchasing issues, will be the workflow and data normalization processes.
>
> On the other hand, the benefits can be huge. I see libraries as institutions who do collection, preservation, organization, and dissemination of data, information, and knowledge for specific audiences. By aggregating the metadata -- as opposed to licensing access to it -- libraries can fulfill their goals and provide useful services at the same time.
>
> Actually having possession of the data/metadata opens quite up a number of possibilities. The creation of a unified search interface is just one of them. No information silos. If the data/metadata is all in the same index, then relevancy ranking algorithms and statistical analysis will be much more valid. (Federated searching. Ick.) Once the data is housed locally, it can be more easily integrated into the wider community of the library. It is easier to seamlessly insert it into course management systems. It is easier to create current awareness services. It is easier to distribute. Once the data/metadata is housed locally, then it is easier to apply "digital humanities" computing techniques against it. Create concordances. Extract statistically significant n-grams (one, two, or n-length phrases). There are things called "champion lists" -- sets of words denoting a theme -- that can then be applied a corpus and used to calculate which items are more relevant. With direct acc!
>   ess to the data/metadata it is easier to literally chart and graph texts thus illustrating similarities between them. Access to the full text allows you to find similar phrases or paragraphs between texts, and this leads to the ability to trace ideas and authors through time.
>
> Finding information is much less of a problem compared to ten or fifteen years ago. Everybody can find. What is needed now are tools that allow you to USE. This, I believe, is a big opportunity for the profession. Having the data/metadata makes such a thing much more possible.
>
>    

-- 
_________________ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Coordinación de proyectos y desarrollo de sistemas
Centro Innov_at_TE, Centro para la Innovación en Tecnología y Educación
Tecnológico de Monterrey

Tel. +52 [81] 8358.2000, Ext. 6751
Enlace intercampus: 80.689.6751, 80.788.6106
http://www.itesm.mx/innovate/

El contenido de este mensaje de datos no se considera oferta, propuesta 
o acuerdo, sino hasta que sea confirmado en documento por escrito que 
contenga la firma autógrafa del apoderado legal del ITESM. El contenido 
de este mensaje de datos es confidencial y se entiende dirigido y para 
uso exclusivo del destinatario, por lo que no podrá distribuirse y/o 
difundirse por ningún medio sin la previa autorización del emisor 
original. Si usted no es el destinatario, se le prohíbe su utilización 
total o parcial para cualquier fin.

The content of this data transmission must not be considered an offer, 
proposal, understanding or agreement unless it is confirmed in a 
document signed by a legal representative of ITESM. The content of this 
data transmission is confidential and is intended to be delivered only 
to the addressees. Therefore, it shall not be distributed and/or 
disclosed through any means without the authorization of the original 
sender. If you are not the addressee, you are forbidden from using it, 
either totally or partially, for any purpose.