Re: Library Books?

From: Stephen Paling <paling_at_nyob> Date: Thu, 1 Jul 2010 20:40:17 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Folks have made a lot of good points, and I'd like to add several other quick ones.

Keep in mind that Google is not using the kind of flatbed scanner that most of us are used to. Their scanners incorporate multiple cameras and software that can take into account the curve of the page as the book sits on the scanner, and then render a flat image. So it's not just a matter of buying a normal scanner.

Google is, I believe, using OCR on all of the books. In other words, within-book searches are possible. If you only have page images without OCR, you've only made a small amount of progress. The images aren't searchable.

We should also ask how much mileage we'll get out of digitizing books. Will it help? Almost surely. Will it change the fact that the lion's share of information will never appear in books? No.

Steve

=====================================
Stephen Paling
Assistant Professor
School of Library and Information Studies
4251 Helen C. White Hall
600 N. Park St.
Madison, WI 53706-1403
Phone: (608) 263-2944
Fax: (608) 263-4849
paling_at_wisc.edu

----- Original Message -----
From: "Montibello, Joseph P." <jmontibello_at_EXETER.EDU>
Date: Thursday, July 1, 2010 8:41 am
Subject: [NGC4LIB] Library Books?
To: NGC4LIB_at_LISTSERV.ND.EDU

> Hi all,
> 
>  
> 
> Stephen Paling wrote: 
> 
>  
> 
> "To put it a bit differently, what I want is ~in~ the document, not next
> to it as a surrogate. The amount of information that is available online
> now dwarfs the information available in print, and searching within
> those online resources is typically far more useful to me."
> 
>  
> 
> I know this is a dumb question but I'll ask it anyway.  How come Google
> can scan books (that they get from libraries??!?) and make a huge
> database out of it and make a ton of money off of it (not yet, but does
> anyone think they won't?) - but libraries can't?
> 
>  
> 
> <overdramatic  but you know what I mean> I think it's because we can't
> get organized. We want MARC or FRBR or RDA or whatever.  And after all
> the fields have been decided on, we want a fully developed, working tool
> to hop out of the grass.  Then we want "other libraries" to use it for 
> a
> year or two to work out all the kinks, and then we'll be ready to form 
> a
> committee to examine whether this new tool will work for our users in
> our specific environment.</obykwim>
> 
>  
> 
> What if we scanned all those books for our own bad selves?  What if we
> ripped off Google's idea of making searches against full-text?  This
> would answer Stephen's need to find things in the book - a need that
> librarians know about. (I regularly tell students that what they need 
> to
> do is go upstairs, get the book off the shelf, and then look at the
> table of contents and index to see if the thing they're interested in 
> is
> covered in the book.) So we can't offer the full text of books because
> of copyright issues (Google cut that Gordian knot, but anyway).
> Wouldn't it help to be able to offer a clue that a specific topic, that
> might not be a chapter heading or a book title or any other piece of
> metadata that we would reasonably expect to create, but that is in the
> text, is in the text?  Wouldn't it help to offer a page preview that
> shows (in a paragraph or two) someplace that the book was mentioned?
> 
> 
> Instead of sharing metadata through OCLC, what if we shared digital
> copies of books?  Upload when you're done scanning, download when you
> buy a copy of a physical book, edit when someone made a crappy scan on
> page 32 and you can do a better one, etc? Then those scanned, uploaded,
> downloaded books became part of our search index, like in Google books,
> with limited previews or full text or as much as we can get away with?
> 
>  
> 
> Joe Montibello, MLIS
> 
> Class of 1945 Library
> 
> Phillips Exeter Academy
> 
> Web:  
> 
> Blog: http://academylibrary.wordpress.com
> < 
> 
>