Google Book Search mentions - was Re:our profession's bibliographic information

From: Cindy Harper <charper_at_nyob> Date: Wed, 22 Dec 2010 10:19:07 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Thanks Jim! I didn't know.  That cuts out a lot of false hits for cites like
title=John, Author=Mark Edwards, especially if I include a publisher term

john AROUND (mark AROUND(2) edwards) AROUND blackwell

But for some multi-edition items, I wouldn't want to limit by publisher.
What I could do is limit by publisher only when my base database (items in
my library collection) has no duplicate author/title combos. Maybe I'll
revisit that project.

Cindy Harper, Systems Librarian
Colgate University Libraries
charper_at_colgate.edu
315-228-7363

On Wed, Dec 22, 2010 at 9:59 AM, Weinheimer Jim <j.weinheimer_at_aur.edu>wrote:

> I'm not sure if the AROUND operator actually work as it is supposed to in
> Google Books correctly, but the search results change.
> http://www.labnol.org/internet/google-around-search-operator/18251/
>
>
> James Weinheimer  j.weinheimer_at_aur.edu
> Director of Library and Information Services
> The American University of Rome
> via Pietro Roselli, 4
> 00153 Rome, Italy
> voice- 011 39 06 58330919 ext. 258
> fax-011 39 06 58330992
> First Thus: http://catalogingmatters.blogspot.com/
> Cooperative Cataloging Rules:
> http://sites.google.com/site/opencatalogingrules/
>
> -----Original Message-----
> From: Next generation catalogs for libraries [mailto:
> NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Cindy Harper
> Sent: Wednesday, December 22, 2010 3:53 PM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] our profession's bibliographic information
>
> I played with a project looking at the number of "mentions" (author + title
> mentions) in the Google Books corpus.  Unfortunately, since there's no
> proximity searching in Google Books, there's no way AFAIK to weed out the
> false hits. Maybe a similar thing could be done with Hathi Trust data? Do
> you know of any indexing software with proximity searching (same sentence?)
> that could be used for such a project?
>
> Cindy Harper, Systems Librarian
> Colgate University Libraries
> charper_at_colgate.edu
> 315-228-7363
>
>
>
> On Tue, Dec 21, 2010 at 2:55 PM, Eric Lease Morgan <emorgan_at_nd.edu> wrote:
>
> > >> In the context of my previous message, there are two types of data:
> > >> 1) quantitative, and 2) qualitative. The former is applicable to
> > >> mathematical processes. The later is not.
> > >
> > > But you can quantify what you call qualitative data, that is, data
> > > that is not numeric. You can count anything, as the applications that
> > > are making use of full text are doing. You can make "more related to"
> > > calculations even using words ("this word is more related to another
> > > word than that word" or "A has a greater relationship to B than C has
> > > a relation to B"). I'm not sure why you would limit yourself to
> > > numerical data, rather than countable data. Once you count, you turn
> > > your data into quantity. Based on the nature of our data, I think
> > > that's where we'll get bang for our computational buck.
> >
> >
> > Only things that are represented as numbers are countable. I can't count
> > The Adventures of Huckleberry Finn. Nor can I count Origami--Juvenile
> works.
> > Yes, I can count the number of books by Mark Twain a library owns, and I
> can
> > count the number of works related to paper craft, but these tabulations
> tell
> > me about the collection. I want to produce quantitative information on
> > works, not the catalog. For example, some measurable characteristics of
> > works may include:
> >
> >  * Big Name index (percentage of quotes from leading authorities)
> >  * color index (normalized percentage of color words used)
> >  * date written
> >  * grade level
> >  * Great Ideas index (percentage of philosophy ideas in text)
> >  * length in words
> >  * librarian rating
> >  * number of citations
> >  * number of editions
> >  * number of graphics
> >  * number of pictures
> >  * number of prizes won
> >  * number of times circulated
> >  * percentage of languages used in a text
> >  * percentage of mathematical formulas in a text
> >  * percentage of unique words in a text
> >  * price
> >  * publisher rating
> >  * readability score
> >  * reader rating
> >
> > Given imagination, I'm sure many more quantifiable characteristics could
> be
> > enumerated.
> >
> > Once done, these characteristics can be compared to one another, and they
> > can be used from two sides of the same problem. On one hand such
> > characteristics can be integrated into "discovery systems" (catalogs) to
> > assist the reader in identifying items for use. "I want a book that is
> > popular, contains a minimum of mathematical formulas, has many citations
> and
> > illustrations, but is not too difficult to read." On the other hand, a
> > person could identify an item not in a collection, feed the item to a
> system
> > for analysis, and return a list of characteristics about the item. "This
> > item is longer than most, has many citations, is expensive, has a low
> reader
> > rating, and is not very 'colorful'." Finally, some sort of graph chart
> could
> > be drawn literally illustrating the characteristics of a given work.
> >
> > Granted, none of this was feasible a decade ago since there was little
> full
> > text. Things are changing. Things are different now. Full text is
> becoming
> > the norm, and this opens up all sorts of possibilities. Somebody is going
> to
> > do this sort of work, if it isn't being investigated already. Libraries
> are
> > not about books. They are about what is inside the books. We need to be
> > providing tools enabling our constituents to use these insides lest the
> > profession becomes marginalized. Find is not as much of a problem to
> solve
> > as it used to be. People can find more than they need, and the amount of
> > effort needed to find more is past the point of diminishing returns.
> > Instead, use and understanding is the name of the game. Measurement is a
> > standard means to understanding. Quantification is necessary element of
> > measurement.
> >
> > --
> > Eric Lease Morgan
> > "Take the Great Books Survey -- http://bit.ly/auPD9Q"
> >
>