Re: Library Books?

From: Karen Coyle <lists_at_nyob> Date: Fri, 2 Jul 2010 09:03:10 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

Quoting Stephen Paling <paling_at_WISC.EDU>:

> Folks have made a lot of good points, and I'd like to add several   
> other quick ones.
>
> Keep in mind that Google is not using the kind of flatbed scanner   
> that most of us are used to. Their scanners incorporate multiple   
> cameras and software that can take into account the curve of the   
> page as the book sits on the scanner, and then render a flat image.   
> So it's not just a matter of buying a normal scanner.

Although Google's scanner set-up is secret and proprietary, I believe  
that non-destructive scanning of the type you describe above is now  
the norm for most book scanning projects. The Internet Archive has  
developed its own scanning hardware. [1] There was a demo of a $300  
high speed book scanner at a conference in the last year or so. [2]  
And running OCR isn't a big deal today, since there is off-the-shelf  
software. What Google has that no one else has, of course, is the $$  
and cycles to do computable improvements to the scans and the derived  
text.

kc
[1]http://www.wired.com/entertainment/theweb/multimedia/2008/03/gallery_internet_archive
[2]http://boingboing.net/2009/04/20/howo-make-a-300-high.html

>
> Google is, I believe, using OCR on all of the books. In other words,  
>  within-book searches are possible. If you only have page images   
> without OCR, you've only made a small amount of progress. The images  
>  aren't searchable.
>
> We should also ask how much mileage we'll get out of digitizing   
> books. Will it help? Almost surely. Will it change the fact that the  
>  lion's share of information will never appear in books? No.
>
> Steve
>
> =====================================
> Stephen Paling
> Assistant Professor
> School of Library and Information Studies
> 4251 Helen C. White Hall
> 600 N. Park St.
> Madison, WI 53706-1403
> Phone: (608) 263-2944
> Fax: (608) 263-4849
> paling_at_wisc.edu
>
> ----- Original Message -----
> From: "Montibello, Joseph P." <jmontibello_at_EXETER.EDU>
> Date: Thursday, July 1, 2010 8:41 am
> Subject: [NGC4LIB] Library Books?
> To: NGC4LIB_at_LISTSERV.ND.EDU
>
>> Hi all,
>>
>>
>>
>> Stephen Paling wrote:
>>
>>
>>
>> "To put it a bit differently, what I want is ~in~ the document, not next
>> to it as a surrogate. The amount of information that is available online
>> now dwarfs the information available in print, and searching within
>> those online resources is typically far more useful to me."
>>
>>
>>
>> I know this is a dumb question but I'll ask it anyway.  How come Google
>> can scan books (that they get from libraries??!?) and make a huge
>> database out of it and make a ton of money off of it (not yet, but does
>> anyone think they won't?) - but libraries can't?
>>
>>
>>
>> <overdramatic  but you know what I mean> I think it's because we can't
>> get organized. We want MARC or FRBR or RDA or whatever.  And after all
>> the fields have been decided on, we want a fully developed, working tool
>> to hop out of the grass.  Then we want "other libraries" to use it for
>> a
>> year or two to work out all the kinks, and then we'll be ready to form
>> a
>> committee to examine whether this new tool will work for our users in
>> our specific environment.</obykwim>
>>
>>
>>
>> What if we scanned all those books for our own bad selves?  What if we
>> ripped off Google's idea of making searches against full-text?  This
>> would answer Stephen's need to find things in the book - a need that
>> librarians know about. (I regularly tell students that what they need
>> to
>> do is go upstairs, get the book off the shelf, and then look at the
>> table of contents and index to see if the thing they're interested in
>> is
>> covered in the book.) So we can't offer the full text of books because
>> of copyright issues (Google cut that Gordian knot, but anyway).
>> Wouldn't it help to be able to offer a clue that a specific topic, that
>> might not be a chapter heading or a book title or any other piece of
>> metadata that we would reasonably expect to create, but that is in the
>> text, is in the text?  Wouldn't it help to offer a page preview that
>> shows (in a paragraph or two) someplace that the book was mentioned?
>>
>>
>> Instead of sharing metadata through OCLC, what if we shared digital
>> copies of books?  Upload when you're done scanning, download when you
>> buy a copy of a physical book, edit when someone made a crappy scan on
>> page 32 and you can do a better one, etc? Then those scanned, uploaded,
>> downloaded books became part of our search index, like in Google books,
>> with limited previews or full text or as much as we can get away with?
>>
>>
>>
>> Joe Montibello, MLIS
>>
>> Class of 1945 Library
>>
>> Phillips Exeter Academy
>>
>> Web:
>>
>> Blog: http://academylibrary.wordpress.com
>> <
>>
>>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet