Re: Resignation

From: T Scott Plutchak <tscott_at_nyob> Date: Fri, 31 Aug 2007 09:23:06 -0500 To: NGC4LIB_at_listserv.nd.edu

Both SCOPUS and Web of Science have implemented author disambiguation
features in the past year, designed to do the kind of thing that Conal
describes in order to determine which among authors of similar names (or
variant names) is the right author for a particular article.  I don't
have any sense of whether these approaches do the job better or worse or
just as good as humans, but they are real-world applications currently
being actively marketed.

Here's a link to an Information Today article from July 2006 that
describes them.
http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16997

T. Scott Plutchak

Director, Lister Hill Library of the Health Sciences
University of Alabama at Birmingham
tscott_at_uab.edu

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of James Weinheimer
Sent: Friday, August 31, 2007 9:12 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Resignation

> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Conal Tuohy
> Sent: Friday, August 31, 2007 3:02 AM
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Resignation
> I'm assuming you're asking how a machine can decide that a given work
> was authored by one (or none) of the above?
>
> If the full text of the books is available, this is actually quite a
> feasible task which can be done by unsupervised machine-learning
> algorithms. Every author has an authorial "fingerprint" which can be
> recognised by attentive readers, and Bayesian statistical techniques
> are even better at picking up such things. The key data for these
> algorithms are the frequency of use and co-occurrences of particular
> words, sentence-lengths, etc. It in no way requires AI capable of
> "understanding" the subject of the text, in the sense that a human
> reader can. The statistical patterns which these algorithms recognise
> are ones which are generally below the conscious perception of human
> readers (who instead tend to focus on what a text actually means).
>
> This is an area where we should expect computers to out-perform
> humans, frankly.

Then show us. I have read so many things of "should" in my life and
maybe some things seem to make sense, but I haven't seen them work in
practice, yet. (Alchemy made a lot of sense, too!) People talk about the
great "automatic translation" but what I've seen is still a disaster.
This was some time back, but a former professor I had had worked his
entire life on automatic translation, only to declare it impossible at
the end. The best you could do was to create a text for a human to edit.
Like he said, if you need the human to edit it anyway, why go through it
in the first place? He was speaking quite some time back. Automatic
translation has come some way, but it's not there yet. And neither is
automatic subject analysis.

We can experiment to our heart's desire, but we cannot draw conclusions
based on ifs maybes and shoulds. We have seen that "Should and maybe"
may come around in 50 or more years--if ever, or it may be next week.

Regards,
Jim

James Weinheimer  j.weinheimer_at_aur.edu
Director of Library and Information Services The American University of
Rome via Pietro Roselli, 4
00153 Rome, Italy
voice- 011 39 06 58330919 ext. 327
fax-011 39 06 58330992