SV: Aggregation of metadata

From: Anders Söderbäck <Anders.Soderback_at_nyob> Date: Thu, 18 Feb 2010 10:32:10 +0100 To: NGC4LIB_at_LISTSERV.ND.EDU

Ross, you make a good devil's advocate and all your arguments makes a lot of sense. Since I am part of the team that together with Marja is investing this issue with regards to the swedish LIBRIS union catalog, I will try my best to respond to them. I definitely think this is a discussion worth having.

> How would this (in and of itself) eliminate information silos?  On the
surface, it would initially create more.  Summon, after all, is just
another silo, albeit a really big one.

It doesn't. Summon is nothing but a big silo. Libraries doing the aggregation themselves would run a very large risk of creating more silos. Since the commercial vendors usually hesitate to openly publish their data (Nature deserves credit for being forward thinking and publishing article level metadata through OAI-PMH) the creation of silos seem hard to avoid, unless we can convince the publishers that metadata shall be open and available to everyone. Open data, I believe, would benefit both libraries and vendors. However, this benefit will only be possible if libraries are willing to do some of the dirty work themselves, not paying the vendors for it.

Summon (or rather any of the current vendor silos, though Summon as of today seem to be the most successful) represents another solution to the problem of non-open data. In the world of silos, one big silo is better than many small silos. And some publishers might be more comfortable with aggregating data through a few big silos. I am afraid, though, that the one big silo world is a step in the wrong direction. One big silo (be it WorldCat, Summon or whatever) means one big obstacle on the road to the world of no silos. It is probably much more difficult to get Summon to publish open data, I would guess their data is bound by several licensing agreements.

> Secondly, doesn't this scenario set up the same problem we're
currently trying to struggle with (every library has the same copy of
the same MARC record, to manage and maintain in a suboptimal system to
managing and maintaining them) except at orders of magnitude greater
scale?

Yes. I am afraid this scenario is even worsethan the problem with the current MARC-silos. In todays world, we have z39.50 (we might not like it but it's there and it's being used) and we have a rough consensus among libraries that record sharing is a good thing. When it comes to data aggregation the data seem to be less open (since it is almost always created by someone else than the in house catalogers, and making aggregated data available through OAI-PMH is more visible than making individual records accessible through z39.50). Also, we don't have the rough consensus on how to cooperate on aggregated data.

That being said, I would be very interested in discussing possible ways of cooperation on data aggregation. If the aim is a world of no silos (and I am naive enough to think that this is what we all want) we need to find ways of cooperating around this issue. Which is one of the reasons for Marjas question - if data is being aggregated somewhere outside of Scandinavia, there might also be interesting ideas about cooperation around this issue. If we, as LIBRIS, would do this it would at least be for the benefit of all Swedish academic libraries. Hopefully more...

> Let me just say that I don't necessarily subscribe to these arguments
I'm making (although I am definitely interested in seeing responses to
them) but at the same time, I'm also not convinced that aggregated
indexes are necessarily the "solution" either (although, yes, in
comparison to federated search, probably).

You are probably correct in that aggregated issus is not the "solution". Or rather, it is not *the* solution, only a solution to the problems of federated search. (But wasn't federated search a few years back being marketed as the solution to the problems of data aggregation?) I would definitely say that actually having possession of the data (as Eric says), actually having it in your own database, gives a lot of possibilites and a lot of flexibility that you don't get when accessing a vendor owned silo through an API. However, getting the data into your own database is, as have been stated by for exampel Till and Diane, painful. Libraries to me seem to be faced with the coice of "pain and flexibility" vs. "less pain and less flexibility". For most libraries, the pain of doing their own aggregation is probably unbearable. For other libraries, it might just be too expensive. However, if we as libraries want flexibility, we need to take this into consideration when signing up !
 for vendor aggregated indexes. What is the future for libraries when the vendors control the means of aggregation (marxist pun intended)?

(Note that I am not complaing about vendors or vendor solutions in general. Not even about vendor aggregated indexes. Though I personally don't like the one big silo scenario, it can be argued that this is the only realistic solution while the no silo solution is too utopic to be useful. I also know that there are vendors working towards the no silo world, and some libraries that seem very keen on keeping their silos. And I can definetely see a future where I myself sign up for a vendor aggregated silo. I just want to know what my options are...

Also, the long term solution I see as regards to both aggregation/federation and cooperation around aggregation is, in three words, Linked Open Data. However, the aggregated silos are here today, LOD might still be the hope of a better tomorrow.)

Best regards
Anders Söderbäck

-----Ursprungligt meddelande-----
Från: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] För Ross Singer
Skickat: den 17 februari 2010 04:16
Till: NGC4LIB_at_LISTSERV.ND.EDU
Ämne: Re: [NGC4LIB] Aggregation of metadata

On Tue, Feb 16, 2010 at 2:26 PM, Eric Lease Morgan <emorgan_at_nd.edu> wrote:

> Actually having possession of the data/metadata opens quite up a number of possibilities. The creation of a unified search interface is just one of them. No information silos. If the data/metadata is all in the same index, then relevancy ranking algorithms and statistical analysis will be much more valid.

Actually, just to play devil's advocate here:

How would this (in and of itself) eliminate information silos?  On the
surface, it would initially create more.  Summon, after all, is just
another silo, albeit a really big one.

Secondly, doesn't this scenario set up the same problem we're
currently trying to struggle with (every library has the same copy of
the same MARC record, to manage and maintain in a suboptimal system to
managing and maintaining them) except at orders of magnitude greater
scale?

Let me just say that I don't necessarily subscribe to these arguments
I'm making (although I am definitely interested in seeing responses to
them) but at the same time, I'm also not convinced that aggregated
indexes are necessarily the "solution" either (although, yes, in
comparison to federated search, probably).

-Ross.

Libraries have to make a choce less pain/less flexibility vs. more pain/more flexibility