On 8/31/07, Karen Coyle <kcoyle_at_kcoyle.net> wrote:
> There are a lot
> of things that just can't be resolved in the course of an email
> exchange. And some technologies do not have a good "... for dummies"
> document that can help someone get started.
It's a bit worse than that, unfortunately. These systems are mostly
found where you expect them, in projects that's heavily funded and
quite hush-hush, such as parliamentary corpus analysis, military
intelligence, counter intelligence, and so forth. There's also some
searches one can use to get more public stuff ;
A popular option these days (because it requires less corpus for analysis) ;
http://www.google.com/search?q=latent+semantic+parsing+project
The two most relevant WikiPedia entries ;
http://en.wikipedia.org/wiki/Artificial_intelligence
http://en.wikipedia.org/wiki/Machine_learning
(both of which are just top-level categories ; there's enough in
there for years of study)
Generics ;
http://www.google.com/search?q=automatic+free+text+classification
I know about the UN corpus being automatically classified, but
couldn't find a link just now. It might be in the results above.
As to, perhaps what people want, more academic papers, I don't know as
I haven't been in that game for almost 10 years. But I think it's a
fair assumption that they can do better now than what they did 10
years ago, no? Anyway, maybe I should cough up some stuff that
certainly threatens the catalogers Status Quo and post a follow-up.
Alex
--
---------------------------------------------------------------------------
Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------
Received on Thu Aug 30 2007 - 17:40:54 EDT