Eric,
Interesting.
You're talking only the full text of English-language documents
here, right ?
But even then, there's lots of room for fundamental ambiguity/
uncertainty in your source of data, it seems to me. To deal with
such meaningfully, the software'd have to be enormously
sophisticated, no ? Can we even make software that
sophisticated ?
And while I don't disagree with your two closing sentences below,
far from it, and you have made these important points here before
-- still I don't really ( yet ) get the ultimate point of this kind of
POS usage analysis as such. ( As opposed to the clearly important
"discovery" possibilities you'd broached in previous posts. )
Where are you headed and why ( apart from just being able to do
it, of course, which may be kinda nice in itself ) ? What value-
added functionality or product is in this case awaiting all those
appreciative library users down the road ( i.e., those who are left
:-] ) ? You may have a clear -- or rough -- idea, but I don't as yet.
( Maybe I'm stupidly overlooking something. )
[ And of course I'm wondering what of significance one could hope
to say -- if anything at all -- in the cases of those two authors on
your list who were presumably being represented not by what they
wrote but by what their translators made of it in a language which
in numerous ways works quite differently to the one they
themselves employed. ]
- Laval Hunsucker
Breukelen, Nederland
----- Original Message ----
From: Eric Lease Morgan <emorgan_at_ND.EDU>
To: NGC4LIB_at_LISTSERV.ND.EDU
Sent: Mon, February 7, 2011 2:10:28 PM
Subject: [NGC4LIB] parts-of-speech
For the past year or so I have been dabbling with text mining, and my latest
foray surrounded the analysis of parts-of-speech (POS) in full text.
With the advent of so much full text, it seems logical to me to figure out ways
to describe individual items -- as well as our collections as a whole -- by
analyzing more than the most basic of bibliographic information. Based on my
initial and rudimentary investigations, differentiating texts on POS is not
promising. From my blog posting:
I now have the tools necessary to answer one of my initial
questions, "Do some works contain a greater number of nouns,
verbs, and adjectives than others?"... The result was very
surprising to me. Despite the wide range of document sizes, and
despite the wide range of genres, the relative percentages of POS
are very similar across all of the documents... Based on this
foray and rudimentary analysis the answers are, "No, there are
not significant differences, and no, works do not contain
different number of nouns, verbs, adjectives, etc."
http://bit.ly/hsxD2i
By exploiting the existence of full text, library "discovery systems" can be so
much more functional and useful. We need to be taking advantage of our
environment to a much greater degree.
--
Eric Lease Morgan
University of Notre Dame
Great Books Survey -- http://bit.ly/auPD9Q
____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html
Received on Wed Feb 09 2011 - 12:40:19 EST