Re: Resignation

From: Nancy Cochran <nancy.cochran_at_nyob> Date: Fri, 31 Aug 2007 12:48:27 -0500 To: NGC4LIB_at_listserv.nd.edu

Please, what is the difference between Weinheirmer Jim and James
Weinheimer?  I see that both are responding to "Regisgnation" and other
threads.

> [Original Message]
> From: Conal Tuohy <Conal.Tuohy_at_VUW.AC.NZ>
> To: <NGC4LIB_at_listserv.nd.edu>
> Date: 9/1/2007 10:31:44 AM
> Subject: Re: [NGC4LIB] Resignation
>
> Nathan, I think you are underestimating the difficulty of the experiment
you are proposing. The difficulty springs from the requirement that the
machine be able to read the works of the various David Johnsons. However,
if someone would scan and OCR these works (or acquire full text from the
publishers) then I think you are right that the rest would indeed be super
easy.
>
> So better to forget the specific "David Johnson" example, and demonstrate
the ability of automated methods using some existing full-text corpus. A
number of researchers in the field of machine learning have already done
this and written up impressive results in published papers. A few examples
have already come up (such as Web of Knowledge). While I'm at it, another
one I remembered reading is "The author-topic model for authors and
documents" from http://portal.acm.org/citation.cfm?id=1036902&jmp=cit
>
> I think, rather than that the technology is not yet strictly feasible,
that the more important reasons why this technology is not already in more
common use in libraries are:
>
> 1) a lack of full text (though full text IS available in some areas, it
is often tied up in subscription databases such as those owned by Web of
Knowledge)
> 2) a lack of library funding, CS expertise, interest, and even
willingness to believe in the possibility (these things all go together)
>
> By contrast some of these techniques are being actively developed by
internet search providers (who have the advertising dollar to pay for it),
and by IT vendors (who have an obvious interest), as well as by researchers
in other spaces such as, interestingly, genomics, which also has to deal
with large bodies of data which have been produced (by natural selection)
without adequate metadata :-)
>
> BTW I don't see any irony in your proposed experiment relying on OCLC's
authority work. Since the experiment was precisely to test the performance
of machines in identifying authors, and your test dataset was precisely a
set of authors defined by OCLC, I don't see how you can avoid making use of
that human authority work in the experiment. Or was there some other irony
I missed? :-)
>
> Cheers
>
> Con
>
> -----Original Message-----
> From: Next generation catalogs for libraries on behalf of Rinne, Nathan
(ESC)
> Sent: Sat 01/09/07 2:18
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Resignation
>
> Obviously, Jim is not one of the faithful.
>
> Let me repeat this:
>
> In order to help along the "doubting Thomases" among the catalogers, let
> me make a plea.  I think it should be super, super easy to do.  Why
> doesn't someone start with Conal Tuohy's claim about our current
> capabilities (using Bayesian statistics) re: all of the David Johnsons
> James Weinheimer informed us of?  I know something about science and
> research, so this ought to be easy enough to empirically test.  First,
> get all the works (only text, I assume?) of all the David Johnsons.  Of
> course, *ironically* [note: this is an addition to this quote] *in order
> to even get started here* I don't see how you would be able to avoid
> needing to use something like OCLC's Worldcat (made possible with its
> wonderful authority control, thank you!) in order to find most, if not
> all, of these works.  Then all you would need to scan them and do the
> test, and find out if it worked or not.  I think this would be very,
> very helpful - and I want help.  Does anyone have the means of doing
> this? (end)
>
> Please note, this is not a demand, this is request.  I think this would
> be very, very helpful.  And I think this would be very, very easy to do
> as well (maybe not over a lunch break, but you know what I mean).
>
> Regards,
> Nathan Rinne
> Media Cataloging Technician
> ISD 279 - Educational Service Center (ESC)
> 11200 93rd Ave. North
> Maple Grove, MN. 55369
> Work phone: 763-391-7183
>
>
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of James Weinheimer
> Sent: Friday, August 31, 2007 9:12 AM
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Resignation
>
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Conal Tuohy
> > Sent: Friday, August 31, 2007 3:02 AM
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Resignation
> > I'm assuming you're asking how a machine can decide that a given work
> > was authored by one (or none) of the above?
> >
> > If the full text of the books is available, this is actually quite a
> > feasible task which can be done by unsupervised machine-learning
> > algorithms. Every author has an authorial "fingerprint" which can be
> > recognised by attentive readers, and Bayesian statistical techniques
> are
> > even better at picking up such things. The key data for these
> algorithms
> > are the frequency of use and co-occurrences of particular words,
> > sentence-lengths, etc. It in no way requires AI capable of
> > "understanding" the subject of the text, in the sense that a human
> > reader can. The statistical patterns which these algorithms recognise
> > are ones which are generally below the conscious perception of human
> > readers (who instead tend to focus on what a text actually means).
> >
> > This is an area where we should expect computers to out-perform
> humans,
> > frankly.
>
> Then show us. I have read so many things of "should" in my life and
> maybe
> some things seem to make sense, but I haven't seen them work in
> practice,
> yet. (Alchemy made a lot of sense, too!) People talk about the great
> "automatic translation" but what I've seen is still a disaster. This was
> some time back, but a former professor I had had worked his entire life
> on
> automatic translation, only to declare it impossible at the end. The
> best
> you could do was to create a text for a human to edit. Like he said, if
> you
> need the human to edit it anyway, why go through it in the first place?
> He
> was speaking quite some time back. Automatic translation has come some
> way,
> but it's not there yet. And neither is automatic subject analysis.
>
> We can experiment to our heart's desire, but we cannot draw conclusions
> based on ifs maybes and shoulds. We have seen that "Should and maybe"
> may
> come around in 50 or more years--if ever, or it may be next week.
>
> Regards,
> Jim
>
> James Weinheimer  j.weinheimer_at_aur.edu
> Director of Library and Information Services
> The American University of Rome
> via Pietro Roselli, 4
> 00153 Rome, Italy
> voice- 011 39 06 58330919 ext. 327
> fax-011 39 06 58330992