Re: Book-scanning projects - a question

From: Karen Coyle <lists_at_nyob>
Date: Thu, 1 Jul 2010 11:47:40 -0700
To: NGC4LIB_at_LISTSERV.ND.EDU
Quoting "B.G. Sloan" <bgsloan2_at_YAHOO.COM>:

>  
> But my main point was: don't most freely-accessible digitized   
> collections consist of books published prior to 1923?

Internet Archive is digitizing post-'23 books and making them  
available in DAISY format to visually impaired readers.[1] Also note  
that many post-'23 books are in the public domain -- if they weren't  
renewed. I have heard of (but unfortunately cannot cite) studies that  
conclude that some 2/3 of the post-'23 works didn't get renewed, and  
therefore are PD. 1923 is used as a cut-off because it is computable  
based on metadata. It's a crude measure, and we need to get beyond it.

kc

[1]http://openlibrary.org/subjects/protected_daisy

>  
> Of course Google is the big exception. But until the Google books   
> settlement is finalized, we won't know how many post-1923 books   
> we'll be able to get full text for from Google.
>  
> Bernie Sloan
>
> --- On Thu, 7/1/10, Eric Lease Morgan <emorgan_at_ND.EDU> wrote:
>
>
> From: Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: [NGC4LIB] Book-scanning projects - a question
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Thursday, July 1, 2010, 1:25 PM
>
>
> On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
>
>> Most of the book-scanning projects are focusing on digitizing works  
>>  in the public domain, right? And the public domain is basically   
>> books published before 1923, right? So, aren't most of these   
>> projects the equivalent of building a physical library collection   
>> of pre-1923 books?
>
>
>
> Along the lines of what is outlined above, I have done a bit of an   
> experiment to see how difficult it would be to supplement our   
> physical holdings with the digital holdings of the Internet Archive.  
>  After all, the content there is free. Here's how:
>
>   1. Dump - Export all of your bibliographic MARC
>      records to a file.
>
>   2. Parse - Extract the authors, titles, and
>      other identifying information from a MARC
>      record.
>
>   3. Search - Use the result of Step #2 to create
>      REST-like searches of the Internet Archive
>      making sure results are returned as XML (or
>      some other machine-readable format).
>
>   4. Verify - Validate the search results making
>      sure they correctly match the MARC. There may
>      be false hits, or there may be multiple hits.
>
>   5. Download - For each Internet Archive records
>      that adequately matches the MARC record,
>      mirror the remote Internet Archive version of
>      the data locally. The PDF as well as the plain
>      text.
>
>   6. Update - For each downloaded record, update the
>      MARC record with two additional URLs. One
>      pointing to the Internet Archive, and another
>      pointing to your local mirror.
>
>   7. Go to Step #2 - Continue the process for each
>      record in your set of MARC records.
>
>   8. Reindex - Make searchable your MARC records as
>      well as the full text that has been mirrored.
>
>   9. Provide services - Enable search against the index.
>      Search results can point to your local physical
>      copy, your local mirrored copy, as well as the
>      remote (canonical) Internet Archive copy. Provide
>      services against the results enabling users to do
>      things like: print-on-demand, bind, do concordance
>      against, generate word cloud, put on reserve, add
>      to a syllabus, annotate, rank, review, graphically
>      illustrate the use of frequently used n-grams, etc.
>
> As alluded to above, some of this work has been done. More   
> specifically, using the MARC records from a thing called the   
> "Catholic Portal", a graduate student and I did Steps #1 through #7.  
>  The hardest part is Step #4. The coolest part is Step #9.
>
> If we, as a profession, were to get to Step #9, then we would be   
> seen as providing truly cutting edge and valuable services to our   
> constituents. Step #9 represents the growth opportunity.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
>
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Thu Jul 01 2010 - 14:48:36 EDT