Quoting Stephen Paling <paling_at_WISC.EDU>:
> Folks have made a lot of good points, and I'd like to add several
> other quick ones.
>
> Keep in mind that Google is not using the kind of flatbed scanner
> that most of us are used to. Their scanners incorporate multiple
> cameras and software that can take into account the curve of the
> page as the book sits on the scanner, and then render a flat image.
> So it's not just a matter of buying a normal scanner.
Although Google's scanner set-up is secret and proprietary, I believe
that non-destructive scanning of the type you describe above is now
the norm for most book scanning projects. The Internet Archive has
developed its own scanning hardware. [1] There was a demo of a $300
high speed book scanner at a conference in the last year or so. [2]
And running OCR isn't a big deal today, since there is off-the-shelf
software. What Google has that no one else has, of course, is the $$
and cycles to do computable improvements to the scans and the derived
text.
kc
[1]http://www.wired.com/entertainment/theweb/multimedia/2008/03/gallery_internet_archive
[2]http://boingboing.net/2009/04/20/howo-make-a-300-high.html
>
> Google is, I believe, using OCR on all of the books. In other words,
> within-book searches are possible. If you only have page images
> without OCR, you've only made a small amount of progress. The images
> aren't searchable.
>
> We should also ask how much mileage we'll get out of digitizing
> books. Will it help? Almost surely. Will it change the fact that the
> lion's share of information will never appear in books? No.
>
> Steve
>
> =====================================
> Stephen Paling
> Assistant Professor
> School of Library and Information Studies
> 4251 Helen C. White Hall
> 600 N. Park St.
> Madison, WI 53706-1403
> Phone: (608) 263-2944
> Fax: (608) 263-4849
> paling_at_wisc.edu
>
> ----- Original Message -----
> From: "Montibello, Joseph P." <jmontibello_at_EXETER.EDU>
> Date: Thursday, July 1, 2010 8:41 am
> Subject: [NGC4LIB] Library Books?
> To: NGC4LIB_at_LISTSERV.ND.EDU
>
>> Hi all,
>>
>>
>>
>> Stephen Paling wrote:
>>
>>
>>
>> "To put it a bit differently, what I want is ~in~ the document, not next
>> to it as a surrogate. The amount of information that is available online
>> now dwarfs the information available in print, and searching within
>> those online resources is typically far more useful to me."
>>
>>
>>
>> I know this is a dumb question but I'll ask it anyway. How come Google
>> can scan books (that they get from libraries??!?) and make a huge
>> database out of it and make a ton of money off of it (not yet, but does
>> anyone think they won't?) - but libraries can't?
>>
>>
>>
>> <overdramatic but you know what I mean> I think it's because we can't
>> get organized. We want MARC or FRBR or RDA or whatever. And after all
>> the fields have been decided on, we want a fully developed, working tool
>> to hop out of the grass. Then we want "other libraries" to use it for
>> a
>> year or two to work out all the kinks, and then we'll be ready to form
>> a
>> committee to examine whether this new tool will work for our users in
>> our specific environment.</obykwim>
>>
>>
>>
>> What if we scanned all those books for our own bad selves? What if we
>> ripped off Google's idea of making searches against full-text? This
>> would answer Stephen's need to find things in the book - a need that
>> librarians know about. (I regularly tell students that what they need
>> to
>> do is go upstairs, get the book off the shelf, and then look at the
>> table of contents and index to see if the thing they're interested in
>> is
>> covered in the book.) So we can't offer the full text of books because
>> of copyright issues (Google cut that Gordian knot, but anyway).
>> Wouldn't it help to be able to offer a clue that a specific topic, that
>> might not be a chapter heading or a book title or any other piece of
>> metadata that we would reasonably expect to create, but that is in the
>> text, is in the text? Wouldn't it help to offer a page preview that
>> shows (in a paragraph or two) someplace that the book was mentioned?
>>
>>
>> Instead of sharing metadata through OCLC, what if we shared digital
>> copies of books? Upload when you're done scanning, download when you
>> buy a copy of a physical book, edit when someone made a crappy scan on
>> page 32 and you can do a better one, etc? Then those scanned, uploaded,
>> downloaded books became part of our search index, like in Google books,
>> with limited previews or full text or as much as we can get away with?
>>
>>
>>
>> Joe Montibello, MLIS
>>
>> Class of 1945 Library
>>
>> Phillips Exeter Academy
>>
>> Web:
>>
>> Blog: http://academylibrary.wordpress.com
>> <
>>
>>
>
--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Fri Jul 02 2010 - 12:04:23 EDT