Re: Book-scanning projects - a question

From: B.G. Sloan <bgsloan2_at_nyob>
Date: Thu, 1 Jul 2010 11:38:11 -0700
To: NGC4LIB_at_LISTSERV.ND.EDU
 
It wasn't *intended* to be a trick question. I just wanted to verify that my simple generalization was largely correct.
 
I'm also curious about how many exceptions there might be to this "pre-1923" rule of thumb, e.g., Openlibrary.org's experiment with the idea of "loaning" in-copyright books to the masses (see: bit.ly/ckog5h).
 
Bernie Sloan

--- On Thu, 7/1/10, Perry Willett <pwillett01_at_GMAIL.COM> wrote:


From: Perry Willett <pwillett01_at_GMAIL.COM>
Subject: Re: [NGC4LIB] Book-scanning projects - a question
To: NGC4LIB_at_LISTSERV.ND.EDU
Date: Thursday, July 1, 2010, 1:51 PM


This seems like a trick question. Why would this surprise you?

Perry Willett
California Digital Library


On Thu, Jul 1, 2010 at 10:37 AM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:

>
> But my main point was: don't most freely-accessible digitized collections
> consist of books published prior to 1923?
>
> Of course Google is the big exception. But until the Google books
> settlement is finalized, we won't know how many post-1923 books we'll be
> able to get full text for from Google.
>
> Bernie Sloan
>
> --- On Thu, 7/1/10, Eric Lease Morgan <emorgan_at_ND.EDU> wrote:
>
>
> From: Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: [NGC4LIB] Book-scanning projects - a question
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Thursday, July 1, 2010, 1:25 PM
>
>
> On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
>
> > Most of the book-scanning projects are focusing on digitizing works in
> the public domain, right? And the public domain is basically books published
> before 1923, right? So, aren't most of these projects the equivalent of
> building a physical library collection of pre-1923 books?
>
>
>
> Along the lines of what is outlined above, I have done a bit of an
> experiment to see how difficult it would be to supplement our physical
> holdings with the digital holdings of the Internet Archive. After all, the
> content there is free. Here's how:
>
>   1. Dump - Export all of your bibliographic MARC
>      records to a file.
>
>   2. Parse - Extract the authors, titles, and
>      other identifying information from a MARC
>      record.
>
>   3. Search - Use the result of Step #2 to create
>      REST-like searches of the Internet Archive
>      making sure results are returned as XML (or
>      some other machine-readable format).
>
>   4. Verify - Validate the search results making
>      sure they correctly match the MARC. There may
>      be false hits, or there may be multiple hits.
>
>   5. Download - For each Internet Archive records
>      that adequately matches the MARC record,
>      mirror the remote Internet Archive version of
>      the data locally. The PDF as well as the plain
>      text.
>
>   6. Update - For each downloaded record, update the
>      MARC record with two additional URLs. One
>      pointing to the Internet Archive, and another
>      pointing to your local mirror.
>
>   7. Go to Step #2 - Continue the process for each
>      record in your set of MARC records.
>
>   8. Reindex - Make searchable your MARC records as
>      well as the full text that has been mirrored.
>
>   9. Provide services - Enable search against the index.
>      Search results can point to your local physical
>      copy, your local mirrored copy, as well as the
>      remote (canonical) Internet Archive copy. Provide
>      services against the results enabling users to do
>      things like: print-on-demand, bind, do concordance
>      against, generate word cloud, put on reserve, add
>      to a syllabus, annotate, rank, review, graphically
>      illustrate the use of frequently used n-grams, etc.
>
> As alluded to above, some of this work has been done. More specifically,
> using the MARC records from a thing called the "Catholic Portal", a graduate
> student and I did Steps #1 through #7. The hardest part is Step #4. The
> coolest part is Step #9.
>
> If we, as a profession, were to get to Step #9, then we would be seen as
> providing truly cutting edge and valuable services to our constituents. Step
> #9 represents the growth opportunity.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
>
>
>
>
Received on Thu Jul 01 2010 - 14:39:33 EDT