Obviously, Jim is not one of the faithful.
Let me repeat this:
In order to help along the "doubting Thomases" among the catalogers, let
me make a plea. I think it should be super, super easy to do. Why
doesn't someone start with Conal Tuohy's claim about our current
capabilities (using Bayesian statistics) re: all of the David Johnsons
James Weinheimer informed us of? I know something about science and
research, so this ought to be easy enough to empirically test. First,
get all the works (only text, I assume?) of all the David Johnsons. Of
course, *ironically* [note: this is an addition to this quote] *in order
to even get started here* I don't see how you would be able to avoid
needing to use something like OCLC's Worldcat (made possible with its
wonderful authority control, thank you!) in order to find most, if not
all, of these works. Then all you would need to scan them and do the
test, and find out if it worked or not. I think this would be very,
very helpful - and I want help. Does anyone have the means of doing
this? (end)
Please note, this is not a demand, this is request. I think this would
be very, very helpful. And I think this would be very, very easy to do
as well (maybe not over a lunch break, but you know what I mean).
Regards,
Nathan Rinne
Media Cataloging Technician
ISD 279 - Educational Service Center (ESC)
11200 93rd Ave. North
Maple Grove, MN. 55369
Work phone: 763-391-7183
-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of James Weinheimer
Sent: Friday, August 31, 2007 9:12 AM
To: NGC4LIB_at_listserv.nd.edu
Subject: Re: [NGC4LIB] Resignation
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Conal Tuohy
> Sent: Friday, August 31, 2007 3:02 AM
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Resignation
> I'm assuming you're asking how a machine can decide that a given work
> was authored by one (or none) of the above?
>
> If the full text of the books is available, this is actually quite a
> feasible task which can be done by unsupervised machine-learning
> algorithms. Every author has an authorial "fingerprint" which can be
> recognised by attentive readers, and Bayesian statistical techniques
are
> even better at picking up such things. The key data for these
algorithms
> are the frequency of use and co-occurrences of particular words,
> sentence-lengths, etc. It in no way requires AI capable of
> "understanding" the subject of the text, in the sense that a human
> reader can. The statistical patterns which these algorithms recognise
> are ones which are generally below the conscious perception of human
> readers (who instead tend to focus on what a text actually means).
>
> This is an area where we should expect computers to out-perform
humans,
> frankly.
Then show us. I have read so many things of "should" in my life and
maybe
some things seem to make sense, but I haven't seen them work in
practice,
yet. (Alchemy made a lot of sense, too!) People talk about the great
"automatic translation" but what I've seen is still a disaster. This was
some time back, but a former professor I had had worked his entire life
on
automatic translation, only to declare it impossible at the end. The
best
you could do was to create a text for a human to edit. Like he said, if
you
need the human to edit it anyway, why go through it in the first place?
He
was speaking quite some time back. Automatic translation has come some
way,
but it's not there yet. And neither is automatic subject analysis.
We can experiment to our heart's desire, but we cannot draw conclusions
based on ifs maybes and shoulds. We have seen that "Should and maybe"
may
come around in 50 or more years--if ever, or it may be next week.
Regards,
Jim
James Weinheimer j.weinheimer_at_aur.edu
Director of Library and Information Services
The American University of Rome
via Pietro Roselli, 4
00153 Rome, Italy
voice- 011 39 06 58330919 ext. 327
fax-011 39 06 58330992
Received on Fri Aug 31 2007 - 10:18:57 EDT