Re: Linking to mass digitized books from library catalogs: one month later

From: Stephens, Owen <o.stephens_at_nyob> Date: Mon, 22 Oct 2007 15:57:17 +0100 To: NGC4LIB_at_listserv.nd.edu

I don't go for brutal honesty, more honest enquiry :) Jan covers quite a
lot in two recent posts, so I've got a number of different things going
on in my response...

Jan wrote:
>
> Google doesn't care for quality because they can't. They just give
> everything to
> everybody

I don't think this is true. My previous point was that Google believes
that you can extract quality from a large body of material using various
techniques. If Google didn't deliver some 'quality' of results then
people would stop using it - the quality of the results from Google was
exactly why I started using it instead of Alta Vista back in the late
nineties (and my previous switch from Lycos to Alta Vista was driven by
similar concerns)

> but we serve a special group of people, scientists,
> humanists,
> academicians,
> students and not people in real life, outside the university.

I think Maurice's point was that in terms of GBS, Google is using the
fact that libraries have preselected this material to at least
prioritise the material they are making available via GBS. If you would
advise your users to search one or more of the library catalogues who
are a GBS partner, I would say that GBS is going to be relevant to your
users.

> Compare with the best American libraries and now also with Max-Planck
> Institute
> in Europe that has said no to "Big Deals", commercial packages
> incorporating
> hundreds or thousands of scientific journals when they found out that
> 80% was not
> used. Other libraries think having everything is better even
> if never used.
>

My take on the 'big deals' is that this is much more an economic
decision than anything else. The big deals are worth it because
subscribing to only the titles that would be selected if you had free
rein is more expensive. This is now part of the economic model. Of
course, some libraries may choose to take a stand and not take the big
deals to break the model and force publishers to rethink, but most
libraries can't afford to do this - It would be interesting to know how
the economics have worked for Max-Planck (are they making a stand, or
does it actually make economic sense for them?).

Money is clearly a key factor in selection - we have more requests for
journal titles from our users than we can fulfil with the budget
available - this has been true of every library I have worked in. I
would guess that most selection is done from within a limited budget, so
at least some of the stuff we aren't collecting is because we can't
afford it, not because we don't want it?

Where you are talking about freely available resources, that don't have
a storage overhead (so for example, the Project Gutenberg texts), with
automated indexing and catalogue records available, the cost is
extremely small - in this case, what is the argument for not collecting?
Presumably it has to be that 'this will never be useful to my users' or
'I judge that this material will impede the users finding the material
they actually want'. I think the former is a difficult statement to
make, and the latter becomes meaningless if you have sufficiently good
information retrieval systems.

If you are actually downloading the digital object and keeping a copy
locally, then possibly we go back to an economic argument - but I had
gathered you were simply cataloguing the item and linking to it?

To try a different illustration, where we have older print journal runs
that have minimal use, we can dispose of them - they cost us money to
store, and use has to be high enough to justify the cost (unless we are
providing a copy of 'last resort' where the material is unique and
valuable enough for us to want to preserve it). If we buy an Elsevier
backfile for a one-off fee, then discover that use is minimal, then we
can't 'discard' it, as we don't have it - just access to it. So what
happens now? Do we remove details from the 'catalogue' (or e-journals
list etc.) on the basis that it isn't useful? This seems contrary -
especially if it is still used however minimally. For me this
illustrates the change that happens when dealing with remotely hosted
digital material.

> We are not in the "everything goes" business, we are not
> commercial, we
> are just
> one of the most valuable pillars in human culture.
>

I don't think that Google are in the 'everything goes' business really -
they are in the 'useful goes' business. However, they clearly believe by
casting their nets as wide as possible, and then filtering out the
rubbish, they can provide better results for any particular user - this
is perhaps because they have an extremely wide range of users, and
therefore what is rubbish to some, will be treasure to others - and I
think it is fair to say that most libraries don't have the same range of
users in mind when they build their collections. I think there is some
'long tail' stuff here as well - where we don't collect, it seems likely
to be stuff from the long tail that we miss - because the cost/benefit
is small in a physical world - but again, in a digital world this
changes.

I'm not sure it is completely true that 'we are not commercial' - there
are many types of library, and they have different missions. In the UK
at least, Universities are becoming more commercial, and the library is
there to serve the University. We are also competing against commercial
providers in some spaces - which means we have to be economically viable
to make sense. There are more explicitly commercial libraries as well.

Lots of stuff in here, but my key points would be:

Collection development policies are partially informed by cost
considerations - these considerations change when dealing with digital
material
Google isn't in the business of serving up rubbish to people - it just
has a different approach to presenting the 'good stuff' to people
Google is well adapted to the digital world - we can learn something
from their approach

Owen