Re: Book-scanning projects - a question

From: Roche GS11 Rachel R US <rocherr_at_nyob> Date: Thu, 1 Jul 2010 14:05:11 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Don't forget the government documents.  Between those and the pre-1923
books, I think it would be a pretty impressive library.  Of course, it
would be impossible for that library to meet every patron need, but
imagine all the things the average public library would no longer have
to carry, and academic libraries could keep fewer physical copies or put
them in storage facilities.  Depending upon what sort of storage method
we used for these documents, we could analyze government reports or
determine which poems are in which pre-1923 anthologies.

So, here is my question: Does it really matter?

Rae Roche
Library of the Marine Corps

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Perry Willett
Sent: Thursday, July 01, 2010 1:51 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Book-scanning projects - a question

This seems like a trick question. Why would this surprise you?

Perry Willett
California Digital Library

On Thu, Jul 1, 2010 at 10:37 AM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:

>
> But my main point was: don't most freely-accessible digitized
collections
> consist of books published prior to 1923?
>
> Of course Google is the big exception. But until the Google books
> settlement is finalized, we won't know how many post-1923 books we'll
be
> able to get full text for from Google.
>
> Bernie Sloan
>
> --- On Thu, 7/1/10, Eric Lease Morgan <emorgan_at_ND.EDU> wrote:
>
>
> From: Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: [NGC4LIB] Book-scanning projects - a question
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Thursday, July 1, 2010, 1:25 PM
>
>
> On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
>
> > Most of the book-scanning projects are focusing on digitizing works
in
> the public domain, right? And the public domain is basically books
published
> before 1923, right? So, aren't most of these projects the equivalent
of
> building a physical library collection of pre-1923 books?
>
>
>
> Along the lines of what is outlined above, I have done a bit of an
> experiment to see how difficult it would be to supplement our physical
> holdings with the digital holdings of the Internet Archive. After all,
the
> content there is free. Here's how:
>
>   1. Dump - Export all of your bibliographic MARC
>      records to a file.
>
>   2. Parse - Extract the authors, titles, and
>      other identifying information from a MARC
>      record.
>
>   3. Search - Use the result of Step #2 to create
>      REST-like searches of the Internet Archive
>      making sure results are returned as XML (or
>      some other machine-readable format).
>
>   4. Verify - Validate the search results making
>      sure they correctly match the MARC. There may
>      be false hits, or there may be multiple hits.
>
>   5. Download - For each Internet Archive records
>      that adequately matches the MARC record,
>      mirror the remote Internet Archive version of
>      the data locally. The PDF as well as the plain
>      text.
>
>   6. Update - For each downloaded record, update the
>      MARC record with two additional URLs. One
>      pointing to the Internet Archive, and another
>      pointing to your local mirror.
>
>   7. Go to Step #2 - Continue the process for each
>      record in your set of MARC records.
>
>   8. Reindex - Make searchable your MARC records as
>      well as the full text that has been mirrored.
>
>   9. Provide services - Enable search against the index.
>      Search results can point to your local physical
>      copy, your local mirrored copy, as well as the
>      remote (canonical) Internet Archive copy. Provide
>      services against the results enabling users to do
>      things like: print-on-demand, bind, do concordance
>      against, generate word cloud, put on reserve, add
>      to a syllabus, annotate, rank, review, graphically
>      illustrate the use of frequently used n-grams, etc.
>
> As alluded to above, some of this work has been done. More
specifically,
> using the MARC records from a thing called the "Catholic Portal", a
graduate
> student and I did Steps #1 through #7. The hardest part is Step #4.
The
> coolest part is Step #9.
>
> If we, as a profession, were to get to Step #9, then we would be seen
as
> providing truly cutting edge and valuable services to our
constituents. Step
> #9 represents the growth opportunity.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
>
>
>
>