Re: Linking to mass digitized books from library catalogs: one month later

From: Stephens, Owen <o.stephens_at_nyob> Date: Thu, 18 Oct 2007 18:10:34 +0100 To: NGC4LIB_at_listserv.nd.edu

I'd definitely agree with the points Maurice makes about selection of material and GBS.

However, I also think it is worth examining the idea of 'selection' in the context of large amounts of digital material being available. Traditionally libraries pre-select material, offering their users a 'filtered' view of available information. When the web first came along, many people believed that the same model would work in this new context. This led to resources such as the 'Virtual Library' (http://vlib.org), and of course the Yahoo Directory.

However, search engines, and of course especially Google, showed that you could take an alternative approach of indexing as much as possible, and then 'post-selecting' by ranking results in order of relevance by automated means. Overall I think it is relatively clear that search engines have been more successful in their approach (to date at least).

There is clearly a challenge here - as more material is either born digital, or digitised, does the pre-selection model still apply, or is post-selection going to become the norm? If the latter, what does it mean for libraries?

Owen

(Takes cover)

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Maurice York
Sent: 18 October 2007 14:30
To: NGC4LIB_at_listserv.nd.edu
Subject: Re: [NGC4LIB] Linking to mass digitized books from library catalogs: one month later

Hi Jan,
I'm curious about this trash-or-treasure line of thinking as a reasoned basis for the manual effort of selection of digitized texts.
You are quite right that libraries specialize in selection and have been doing it for thousands of years (more in generalities than realities, since I don't believe any library with a currently functioning collection has been around for more than a few hundred).
But it seems to me that this is the very reason Google saw libraries as such an attractive proposition for digitization--they have been building high-quality collections of print materials and (presumably) sorting much of the dross according to sustained plans over long periods of time. When you say that the vast majority of texts in Google are "bad quality, bad relevance", that seems more a dig at American libraries and how we collect than at Google, since Google's collection is no more and no less than what librarians have created.
Let me expand that a bit....it's something of a criticism of the libraries of Spain, Germany, the Netherlands, Japan, England, and France as well, all of whom are digitizing books with Google.

I do respect the amount of effort you are putting into selecting ebook titles for your catalog--your faculty and students are lucky to have such mindfulness and dedication. I think very few people would argue for dumping every book in GBS into their own catalog--that's what Google and WorldCat are for. But if we are harvesting links to digitized content of items that we already own (which unless I misunderstand is the approach Tim is putting forward), then we are simply extending the utility of the collections we have already built--not throwing white noise into them.

One last point I will comment on, which I think gets to the heart of one of the issues we need to grapple with in looking at how our catalogs behave in the broader context of the digitized environment.
That is the working theorem that academics only want to see what is "relevant to them" rather than "everything that's available". If that were true, libraries would be the most beloved place on the planet to start looking for information, since we have traditionally tried to make a friendly, peaceful enivronment stocked with just "what's relevant to you". But we're not--only 1% of people start their searches at a library catalog, and among both faculty and students Google  blows libraries, PubMed, ScienceDirect, you name it, out of the water as the first place to go for information. By and large, they come to the library catalog after they've found what they want somewhere else.

My point is that we should use and promote all the tools at our disposal for what they are good for. One of the great utilities of GBS and similar tools in my own research is that I can discover relevant content and leads in places I never would have imagined looking, and which are not reflected in the cataloging and organization of the library collections I use. Conversely, my favorite library collections give me structured entry points for discovery that GBS currently can't deliver. My ideal research environment would be  a happy marriage of the two, and I believe in an increasingly multi-disciplinary academic environment where the best research crosses unanticipated boundaries and pulls together unexpected avenues of thought, that is what our libraries should strive for.

-Maurice

--
************************************
Maurice York
Associate Head, Information Technology
NCSU Libraries
North Carolina State University
Raleigh, NC 27695

maurice_york_at_ncsu.edu
Phone: 919-515-3518

On 10/18/07, Jan Szczepanski <jan.szczepanski_at_ub.gu.se> wrote:
> Thanks Steve for showing interest
>
> Steve Toub wrote:
> > Very interesting, Jan. Thanks for sharing.
> >
> >> It takes less than five minutes to create an e-record by reusing an
> >> p-record and add the fiels necessary to transform the record to an
> >> e-record.
> >
> > Are you making the edits manually or have you automated this process
> > in some way?
> I do it manually but would love automation but that seems to be a
> dream that will take 5-10 years before it hits ground.
> >
> >
> >> I have collected by myself up to today more than 17.000 e-books.
> > > I can do about 10.000 per year
> >
> > Wow! Is your employer supportive of this or are you doing this on
> > your own time?
>
> This is part of a project. My hopeless dream is by showing the way
> others would follow. Only a couple of small special libraries have
> been inspired, and started cataloguing OA working papers in the political field.
>
> Libraries all over the world pays for ebrary or/and Netlibrary books,
> in spite of the fact that most of the titles are uninteresting, the
> selection is to 95% belove all descent quality criteria. The can fool
> a student but how can the fool academic libraries? That's strange.
>
> In theory any library could import my 17.000 titles for free but why
> don't they do that? I can understand that nobody outside Sweden is
> interested in the Swedish e-books but why not the rest?
>
> We are still too much in the pulp business and we have handed over to
> much power to commercial companies selling "Big Deals".
>
> >
> >> So what is the point to mecanically harvest GBS URLs if most of it
> >> is not of any value?
> >
> > Hmmm. One man's trash is another man's treasure. I think I'd have a
> > hard time convincing a faculty member at my institution that a
> > volume we had in print wasn't worth being digitized.
> That may be right, but we have a specific men and women, academics and
> they are not interested in haveing "everything" digitlized. You can
> use Bradford's law 20/80. Only twenty percent of the Google books is
> of interest and because I'm working in a Swedish context, maybe just
> 5-10% will be of interest in Sweden.
> >
> > I've heard that the selection process takes more effort/time than
> > the technical processing--folks like Google may be scanning
> > everything on the shelf since it's too much effort to do the
> > selection. How much time to do spend on "selection" to separate the trash from the treasure?
> Compared with a librarian what is Google? We have been around now for
> thousands of years, long before Google and even universities and
> selection is our speciality. We have never acquired everything.
>
> Google is just a clever machine making a lot of money on a commercial
> market.
>
> How much time I spend "selecting"? Less than 5%, rest is boring and
> mechanical cataloguing.
>
> Yesterday I made these twenty four books in the afternoon
>
> Fritt tillgänglig från Center for Contemporary Arab Studies
> http://ccas.georgetown.edu/research-papers.cfm
> Summa: 12 fria e-böcker 17.10.07
>
> Fritt tillgänglig via Swisspeace
> http://www.swisspeace.ch/typo3/en/publications/working-papers/index.ht
> ml
> Summa: 12 fria e-böcker 17.10.07
>
> and selected and catalogued about 25 from:
>
> Fritt tillgänglig via Religion Online
> http://www.religion-online.org/listbooks.asp
>
> The titles on this list is really
> going from trash to treasure. I will select less than 50% of these
> about two hundred titles when I continue later today with the project.
>
>
> Jan
>
>
> >
> >        --SET
>
> --
>
> Jan Szczepanski
> Förste bibliotekarie
> Goteborgs universitetsbibliotek
> Box 222
> SE 405 30 Goteborg, SWEDEN
> Tel: +46 31 773 1164 Fax: +46 31 163797
> E-mail: Jan.Szczepanski_at_ub.gu.se
>