The process of recognising names and places in a sequence of free text is
known as Named Entity Recognition.
I would expect that they are using some sort of Finite State Transducer
(regular expressions on steroids).
See:
Wikipedia article: http://en.wikipedia.org/wiki/Named_Entity_Recognition
Message Understand Conferences (MUC) -
http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html
GATE (well supported tool chain for textual extraction)
http://www.gate.ac.uk/
Simon
On Tue, Apr 8, 2008 at 10:30 AM, Karen Coyle <kcoyle_at_kcoyle.net> wrote:
> Aha! yes, it does seem to pick up the context in some cases -- oh, I
> really want to know what that algorithm is! They definitely are using
> capitalization to get full names and place names. But it's much more
> sophisticated than that. I would hazard a guess: the longest matching
> string within "n" words?
>
> kc
>
>
> James Weinheimer wrote:
>
> > It does work with multiple words, although I don't understand how: it's
> > a
> > black box.
> > For example, the record for "Blackout : World War II and the origins of
> > film
> > noir / Sheri Chinen Biesen" in
> > http://www.galileo.aur.it/cgi-bin/koha/opac-detail.pl?bib=1272
> >
> > If you click on "film" in the subtitle, you will get "film noir."
> >
> > But I don't understand how it works. Also, if you click on names, you
> > might
> > get your person, but you might get the soccer/football star. In any
> > case, in
> > the box, there is: Match: film noir and others. Here, you can click on
> > "others" and get some choices.
> >
> > Jim
> >
> > James Weinheimer j.weinheimer_at_aur.edu
> > Director of Library and Information Services
> > The American University of Rome
> > via Pietro Roselli, 4
> > 00153 Rome, Italy
> > voice- 011 39 06 58330919 ext. 327
> > fax-011 39 06 58330992
> >
> >
> > -----Original Message-----
> > > From: Next generation catalogs for libraries
> > > [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle
> > > Sent: Tuesday, April 08, 2008 3:51 PM
> > > To: NGC4LIB_at_LISTSERV.ND.EDU
> > > Subject: Re: [NGC4LIB] NGC4LIB Digest - 4 Apr 2008 to 7 Apr 2008
> > > (#2008-
> > > 72)
> > >
> > > I, too, hate anything that pops up or acts as I move about the screen,
> > > or that comes between me and what I'm actually trying to do. I hadn't
> > > thought of the copy/paste, but I did note that it doesn't work on
> > > multiple words, just single words. In many cases, single words are not
> > > the target. (Try: American | University | Rome ;-))
> > >
> > > kc
> > >
> > > James Weinheimer wrote:
> > >
> > > > That's interesting. I did write to answers.com to ask if they could
> > > >
> > > allow
> > >
> > > > some additional flexibility in event handlers, perhaps with a right
> > > >
> > > click,
> > >
> > > > or some other way. Currently, it also doesn't work with text that is
> > > >
> > > linked,
> > >
> > > > which--I would say--are some of the more important words on a page.
> > > >
> > > > Jim
> > > >
> > > > James Weinheimer j.weinheimer_at_aur.edu
> > > > Director of Library and Information Services
> > > > The American University of Rome
> > > > via Pietro Roselli, 4
> > > > 00153 Rome, Italy
> > > > voice- 011 39 06 58330919 ext. 327
> > > > fax-011 39 06 58330992
> > > >
> > > > -----Original Message-----
> > > > > From: Next generation catalogs for libraries
> > > > > [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Selden Deemer
> > > > > Sent: Tuesday, April 08, 2008 12:46 PM
> > > > > To: NGC4LIB_at_LISTSERV.ND.EDU
> > > > > Subject: Re: [NGC4LIB] NGC4LIB Digest - 4 Apr 2008 to 7 Apr 2008
> > > > >
> > > > (#2008-
> > >
> > > > 72)
> > > > >
> > > > > This is one of the "enhancements" of the NYT that drives me crazy.
> > > > > I frequently select text for various purposes, including lookups
> > > > > using the LibX plugin, and dislike intensely the intervention of
> > > > > the Answer Tips lookup.
> > > > >
> > > > > Selden Deemer, Library Systems Administrator
> > > > > Emory University Libraries
> > > > > Atlanta, Georgia
> > > > > EMAIL: libssd_at_emory.edu
> > > > > PHONE: 404-727-0271
> > > > > FAX: 404-727-0827
> > > > >
> > > > >
> > > > >
> > > > > On Apr 7, 2008, at 11:01 PM, Automatic digest processor wrote:
> > > > >
> > > > > > There is one message totalling 26 lines in this issue.
> > > > > >
> > > > > > Topics of the day:
> > > > > >
> > > > > > 1. Answer Tips
> > > > > >
> > > > > >
> > > > > > ----------------------------------------------------------------------
> > > > > >
> > > > > > Date: Mon, 7 Apr 2008 10:09:21 +0200
> > > > > > From: Weinheimer Jim <j.weinheimer_at_AUR.EDU>
> > > > > > Subject: Answer Tips
> > > > > >
> > > > > > All,
> > > > > >
> > > > > > I don't know if anybody has implemented the Answer Tips from
> > > > > > answers.com. I saw =
> > > > > > it on the NY Times this weekend and decided to try to implement
> > > > > > it.
> > > > > > An example r=
> > > > > > ecord is at:
> > > > > >
> > > > > > http://www.galileo.aur.it/cgi-bin/koha/opac-detail.pl?bib=3D19902
> > > > > >
> > > > > > Double-clicking on any word in the record searches answers.com.
> > > > > > Very nice. It do=
> > > > > > esn't work with text that's already linked, however, so I'm
> > > > > > thinking of reworkin=
> > > > > > g my record display a little bit.
> > > > > >
> > > > > > And best of all, it only took about 10 seconds to implement!
> > > > > >
> > > > > > James Weinheimer
> > > > > > The American University of Rome
> > > > > > Rome, Italy
> > > > > >
> > > > > > ------------------------------
> > > > > >
> > > > > > End of NGC4LIB Digest - 4 Apr 2008 to 7 Apr 2008 (#2008-72)
> > > > > > ***********************************************************
> > > > > >
> > > > > >
> > > > > >
> > > > --
> > > -----------------------------------
> > > Karen Coyle / Digital Library Consultant
> > > kcoyle@kcoyle.net http://www.kcoyle.net
> > > ph.: 510-540-7596 skype: kcoylenet
> > > fx.: 510-848-3913
> > > mo.: 510-435-8234
> > > ------------------------------------
> > >
> >
> >
> >
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596 skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>
Received on Tue Apr 08 2008 - 11:45:57 EDT