Re: Linking to mass digitized books from library catalogs: one month later

From: Maurice York <maurice_york_at_nyob>
Date: Thu, 18 Oct 2007 09:29:51 -0400
To: NGC4LIB_at_listserv.nd.edu
Hi Jan,
I'm curious about this trash-or-treasure line of thinking as a
reasoned basis for the manual effort of selection of digitized texts.
You are quite right that libraries specialize in selection and have
been doing it for thousands of years (more in generalities than
realities, since I don't believe any library with a currently
functioning collection has been around for more than a few hundred).
But it seems to me that this is the very reason Google saw libraries
as such an attractive proposition for digitization--they have been
building high-quality collections of print materials and (presumably)
sorting much of the dross according to sustained plans over long
periods of time. When you say that the vast majority of texts in
Google are "bad quality, bad relevance", that seems more a dig at
American libraries and how we collect than at Google, since Google's
collection is no more and no less than what librarians have created.
Let me expand that a bit....it's something of a criticism of the
libraries of Spain, Germany, the Netherlands, Japan, England, and
France as well, all of whom are digitizing books with Google.

I do respect the amount of effort you are putting into selecting ebook
titles for your catalog--your faculty and students are lucky to have
such mindfulness and dedication. I think very few people would argue
for dumping every book in GBS into their own catalog--that's what
Google and WorldCat are for. But if we are harvesting links to
digitized content of items that we already own (which unless I
misunderstand is the approach Tim is putting forward), then we are
simply extending the utility of the collections we have already
built--not throwing white noise into them.

One last point I will comment on, which I think gets to the heart of
one of the issues we need to grapple with in looking at how our
catalogs behave in the broader context of the digitized environment.
That is the working theorem that academics only want to see what is
"relevant to them" rather than "everything that's available". If that
were true, libraries would be the most beloved place on the planet to
start looking for information, since we have traditionally tried to
make a friendly, peaceful enivronment stocked with just "what's
relevant to you". But we're not--only 1% of people start their
searches at a library catalog, and among both faculty and students
Google  blows libraries, PubMed, ScienceDirect, you name it, out of
the water as the first place to go for information. By and large, they
come to the library catalog after they've found what they want
somewhere else.

My point is that we should use and promote all the tools at our
disposal for what they are good for. One of the great utilities of GBS
and similar tools in my own research is that I can discover relevant
content and leads in places I never would have imagined looking, and
which are not reflected in the cataloging and organization of the
library collections I use. Conversely, my favorite library collections
give me structured entry points for discovery that GBS currently can't
deliver. My ideal research environment would be  a happy marriage of
the two, and I believe in an increasingly multi-disciplinary academic
environment where the best research crosses unanticipated boundaries
and pulls together unexpected avenues of thought, that is what our
libraries should strive for.

-Maurice

--
************************************
Maurice York
Associate Head, Information Technology
NCSU Libraries
North Carolina State University
Raleigh, NC 27695

maurice_york_at_ncsu.edu
Phone: 919-515-3518

On 10/18/07, Jan Szczepanski <jan.szczepanski_at_ub.gu.se> wrote:
> Thanks Steve for showing interest
>
> Steve Toub wrote:
> > Very interesting, Jan. Thanks for sharing.
> >
> >> It takes less than five minutes to create an e-record by reusing an
> >> p-record and add the
> >> fiels necessary to transform the record to an e-record.
> >
> > Are you making the edits manually or have you automated this process in
> > some way?
> I do it manually but would love automation but that seems to be a dream
> that will take 5-10 years before it hits ground.
> >
> >
> >> I have collected by myself up to today more than 17.000 e-books.
> > > I can do about 10.000 per year
> >
> > Wow! Is your employer supportive of this or are you doing this on your
> > own time?
>
> This is part of a project. My hopeless dream is by showing the way others
> would follow. Only a couple of small special libraries have been inspired,
> and started cataloguing OA working papers in the political field.
>
> Libraries all over the world pays for ebrary or/and Netlibrary books, in
> spite of the fact that most of the titles are uninteresting, the
> selection is
> to 95% belove all descent quality criteria. The can fool a student but how
> can the fool academic libraries? That's strange.
>
> In theory any library could import my 17.000 titles for free but why don't
> they do that? I can understand that nobody outside Sweden is interested
> in the Swedish e-books but why not the rest?
>
> We are still too much in the pulp business and we have handed over to
> much power to commercial companies selling "Big Deals".
>
> >
> >> So what is the point to mecanically harvest GBS
> >> URLs if most of it
> >> is not of any value?
> >
> > Hmmm. One man's trash is another man's treasure. I think I'd have a hard
> > time convincing a faculty member at my institution that a volume we had
> > in print wasn't worth being digitized.
> That may be right, but we have a specific men and women, academics and
> they are not interested in haveing "everything" digitlized. You can use
> Bradford's law 20/80. Only twenty percent of the Google books is of
> interest and because I'm working in a Swedish context, maybe just 5-10%
> will be of interest in Sweden.
> >
> > I've heard that the selection process takes more effort/time than the
> > technical processing--folks like Google may be scanning everything on
> > the shelf since it's too much effort to do the selection. How much time
> > to do spend on "selection" to separate the trash from the treasure?
> Compared with a librarian what is Google? We have been around now
> for thousands of years, long before Google and even universities and
> selection is our speciality. We have never acquired everything.
>
> Google is just a clever machine making a lot of money on a commercial
> market.
>
> How much time I spend "selecting"? Less than 5%, rest is boring and
> mechanical cataloguing.
>
> Yesterday I made these twenty four books in the afternoon
>
> Fritt tillgänglig från Center for Contemporary Arab Studies
> http://ccas.georgetown.edu/research-papers.cfm
> Summa: 12 fria e-böcker 17.10.07
>
> Fritt tillgänglig via Swisspeace
> http://www.swisspeace.ch/typo3/en/publications/working-papers/index.html
> Summa: 12 fria e-böcker 17.10.07
>
> and selected and catalogued about 25 from:
>
> Fritt tillgänglig via Religion Online
> http://www.religion-online.org/listbooks.asp
>
> The titles on this list is really
> going from trash to treasure. I will select less than 50% of these about
> two hundred titles when I continue later today with the project.
>
>
> Jan
>
>
> >
> >        --SET
>
> --
>
> Jan Szczepanski
> Förste bibliotekarie
> Goteborgs universitetsbibliotek
> Box 222
> SE 405 30 Goteborg, SWEDEN
> Tel: +46 31 773 1164 Fax: +46 31 163797
> E-mail: Jan.Szczepanski_at_ub.gu.se
>
Received on Thu Oct 18 2007 - 09:47:34 EDT