Quoting "B.G. Sloan" <bgsloan2_at_YAHOO.COM>:
>
> But my main point was: don't most freely-accessible digitized
> collections consist of books published prior to 1923?
Internet Archive is digitizing post-'23 books and making them
available in DAISY format to visually impaired readers.[1] Also note
that many post-'23 books are in the public domain -- if they weren't
renewed. I have heard of (but unfortunately cannot cite) studies that
conclude that some 2/3 of the post-'23 works didn't get renewed, and
therefore are PD. 1923 is used as a cut-off because it is computable
based on metadata. It's a crude measure, and we need to get beyond it.
kc
[1]http://openlibrary.org/subjects/protected_daisy
>
> Of course Google is the big exception. But until the Google books
> settlement is finalized, we won't know how many post-1923 books
> we'll be able to get full text for from Google.
>
> Bernie Sloan
>
> --- On Thu, 7/1/10, Eric Lease Morgan <emorgan_at_ND.EDU> wrote:
>
>
> From: Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: [NGC4LIB] Book-scanning projects - a question
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Date: Thursday, July 1, 2010, 1:25 PM
>
>
> On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
>
>> Most of the book-scanning projects are focusing on digitizing works
>> in the public domain, right? And the public domain is basically
>> books published before 1923, right? So, aren't most of these
>> projects the equivalent of building a physical library collection
>> of pre-1923 books?
>
>
>
> Along the lines of what is outlined above, I have done a bit of an
> experiment to see how difficult it would be to supplement our
> physical holdings with the digital holdings of the Internet Archive.
> After all, the content there is free. Here's how:
>
> 1. Dump - Export all of your bibliographic MARC
> records to a file.
>
> 2. Parse - Extract the authors, titles, and
> other identifying information from a MARC
> record.
>
> 3. Search - Use the result of Step #2 to create
> REST-like searches of the Internet Archive
> making sure results are returned as XML (or
> some other machine-readable format).
>
> 4. Verify - Validate the search results making
> sure they correctly match the MARC. There may
> be false hits, or there may be multiple hits.
>
> 5. Download - For each Internet Archive records
> that adequately matches the MARC record,
> mirror the remote Internet Archive version of
> the data locally. The PDF as well as the plain
> text.
>
> 6. Update - For each downloaded record, update the
> MARC record with two additional URLs. One
> pointing to the Internet Archive, and another
> pointing to your local mirror.
>
> 7. Go to Step #2 - Continue the process for each
> record in your set of MARC records.
>
> 8. Reindex - Make searchable your MARC records as
> well as the full text that has been mirrored.
>
> 9. Provide services - Enable search against the index.
> Search results can point to your local physical
> copy, your local mirrored copy, as well as the
> remote (canonical) Internet Archive copy. Provide
> services against the results enabling users to do
> things like: print-on-demand, bind, do concordance
> against, generate word cloud, put on reserve, add
> to a syllabus, annotate, rank, review, graphically
> illustrate the use of frequently used n-grams, etc.
>
> As alluded to above, some of this work has been done. More
> specifically, using the MARC records from a thing called the
> "Catholic Portal", a graduate student and I did Steps #1 through #7.
> The hardest part is Step #4. The coolest part is Step #9.
>
> If we, as a profession, were to get to Step #9, then we would be
> seen as providing truly cutting edge and valuable services to our
> constituents. Step #9 represents the growth opportunity.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
>
>
>
>
--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Thu Jul 01 2010 - 14:48:36 EDT