million books problem

From: Eric Lease Morgan <emorgan_at_nyob> Date: Thu, 6 Mar 2008 13:47:04 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

> What can be done with the very large digital collections
> generated by mass digitization projects? What services do
> scholars need? How do we manage digital collections when the
> material is abundant rather than selective? What systems or
> infrastructure is necessary to provide services and
> materials to scholars?
>
>   http://www.clir.org/activities/digitalscholar/

The report/article linked above describes and echoes the discussions
of a CLIR one-day seminar surrounding the "million books" problem,
namely, what sorts of changes might librarianship and scholarship
experience with the advent of millions of easily accessible full-text
books. Think Google Books and the Open Content Alliance (OCA). I
think the report is food for thought regarding the definition of of
"next generation" library catalogs.

With the advent of the books for Google Books and the OCA, libraries
will need to think not only of the curation of the book as object but
also to think of the book as container. We will need to understand
that computers are able to "read" books much faster than humans; a
computer's scale is much larger than a human's. While a computer's
interpretation or understanding of the content of the book is
negligible, the computer is able to count words and thus summarize
texts, find patterns in and between texts, do rudimentary translation
of texts, graph and chart this statistical analysis, suggest other
works which may be similar, map events or places, track references,
infer relationships, do rudimentary linguistic morphology, etc. As
the report highlighted, this is not science-fiction. This sort of
work is being done now.

If these things are true, and if they begin to change the
expectations of researchers and raise the bar of scholarship, then
how might the role of libraries need to change? In an environment
where everybody has collections -- anybody can download all of the
Open Content Alliance's growing number of 360,000 texts today -- what
sorts of services do libraries provide against these collections, if
any?

Towards the end of the report were a number of recommendations --
priorities for future work:

   1. Find ways to provide analytical access to
      the Open Content book data.

   2. Apply questions regarding programmer (API)
      access to all open content collections.

   3. Compare costs of scanning with the intensive
      process of transcription.

   4. Understand how the value of domain-specific
      services can be applied to common collections.

   5. Examine the future education of the
      information professional.

All of this makes me think the "'next generation' library catalog" is
not so much about find and discovery but more about acquire and
manipulate. I see these "catalogs" more akin to indexes coupled with
user services thus allowing students, instructors, and scholars to
use the content they download in a myriad of ways.

The 32-page report/article was not hard to read and thoroughly
engaging. Intellectually stimulating and great food for thought. You
gotta love academia.

--
Eric Lease Morgan
University Libraries of Notre Dame