Re: Book-scanning projects - a question

From: Perry Willett <pwillett01_at_nyob> Date: Thu, 1 Jul 2010 11:54:31 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Bernie,

I think your point is largely correct. Exceptions include gov docs, a few
pilot projects with univ presses, some projects where the library either
owns the rights or received permission from rights holders, but these are
around the edges. Google of course is the big exception.

Perry Willett
California Digital Library

On Thu, Jul 1, 2010 at 11:38 AM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:

>
> It wasn't *intended* to be a trick question. I just wanted to verify that
> my simple generalization was largely correct.
>
> I'm also curious about how many exceptions there might be to this
> "pre-1923" rule of thumb, e.g., Openlibrary.org's experiment with the idea
> of "loaning" in-copyright books to the masses (see: bit.ly/ckog5h).
>
> Bernie Sloan
>
> --- On Thu, 7/1/10, Perry Willett <pwillett01_at_GMAIL.COM> wrote:
>
>
> From: Perry Willett <pwillett01_at_GMAIL.COM>
> Subject: Re: [NGC4LIB] Book-scanning projects - a question
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Thursday, July 1, 2010, 1:51 PM
>
>
> This seems like a trick question. Why would this surprise you?
>
> Perry Willett
> California Digital Library
>
>
> On Thu, Jul 1, 2010 at 10:37 AM, B.G. Sloan <bgsloan2_at_yahoo.com> wrote:
>
> >
> > But my main point was: don't most freely-accessible digitized collections
> > consist of books published prior to 1923?
> >
> > Of course Google is the big exception. But until the Google books
> > settlement is finalized, we won't know how many post-1923 books we'll be
> > able to get full text for from Google.
> >
> > Bernie Sloan
> >
> > --- On Thu, 7/1/10, Eric Lease Morgan <emorgan_at_ND.EDU> wrote:
> >
> >
> > From: Eric Lease Morgan <emorgan_at_ND.EDU>
> > Subject: Re: [NGC4LIB] Book-scanning projects - a question
> > To: NGC4LIB_at_LISTSERV.ND.EDU
> > Date: Thursday, July 1, 2010, 1:25 PM
> >
> >
> > On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
> >
> > > Most of the book-scanning projects are focusing on digitizing works in
> > the public domain, right? And the public domain is basically books
> published
> > before 1923, right? So, aren't most of these projects the equivalent of
> > building a physical library collection of pre-1923 books?
> >
> >
> >
> > Along the lines of what is outlined above, I have done a bit of an
> > experiment to see how difficult it would be to supplement our physical
> > holdings with the digital holdings of the Internet Archive. After all,
> the
> > content there is free. Here's how:
> >
> >   1. Dump - Export all of your bibliographic MARC
> >      records to a file.
> >
> >   2. Parse - Extract the authors, titles, and
> >      other identifying information from a MARC
> >      record.
> >
> >   3. Search - Use the result of Step #2 to create
> >      REST-like searches of the Internet Archive
> >      making sure results are returned as XML (or
> >      some other machine-readable format).
> >
> >   4. Verify - Validate the search results making
> >      sure they correctly match the MARC. There may
> >      be false hits, or there may be multiple hits.
> >
> >   5. Download - For each Internet Archive records
> >      that adequately matches the MARC record,
> >      mirror the remote Internet Archive version of
> >      the data locally. The PDF as well as the plain
> >      text.
> >
> >   6. Update - For each downloaded record, update the
> >      MARC record with two additional URLs. One
> >      pointing to the Internet Archive, and another
> >      pointing to your local mirror.
> >
> >   7. Go to Step #2 - Continue the process for each
> >      record in your set of MARC records.
> >
> >   8. Reindex - Make searchable your MARC records as
> >      well as the full text that has been mirrored.
> >
> >   9. Provide services - Enable search against the index.
> >      Search results can point to your local physical
> >      copy, your local mirrored copy, as well as the
> >      remote (canonical) Internet Archive copy. Provide
> >      services against the results enabling users to do
> >      things like: print-on-demand, bind, do concordance
> >      against, generate word cloud, put on reserve, add
> >      to a syllabus, annotate, rank, review, graphically
> >      illustrate the use of frequently used n-grams, etc.
> >
> > As alluded to above, some of this work has been done. More specifically,
> > using the MARC records from a thing called the "Catholic Portal", a
> graduate
> > student and I did Steps #1 through #7. The hardest part is Step #4. The
> > coolest part is Step #9.
> >
> > If we, as a profession, were to get to Step #9, then we would be seen as
> > providing truly cutting edge and valuable services to our constituents.
> Step
> > #9 represents the growth opportunity.
> >
> > --
> > Eric Lease Morgan
> > University of Notre Dame
> >
> >
> >
> >
> >
>
>
>
>
>