Re: Resignation

From: Weinheimer Jim <j.weinheimer_at_nyob> Date: Tue, 4 Sep 2007 09:24:46 +0200 To: NGC4LIB_at_listserv.nd.edu

> On Sat, 2007-09-01, Conal Tuohy wrote:

> Any system which has to guess the authorship of works will be liable to
> error (whether the system is human or artificial). If an author's name
> is common, the likelihood of error will be higher (again, this is true
> for humans as well as for computers). So differentiating the works of
> multiple David Johnsons will of course be harder than differentiating
> the works of multiple Jiamagurdni Smiths. I don't think anyone could
> dispute that, and it has nothing to do with the reason why I suggested
> forgetting David Johnson.

I was trying to say that there will be a single Jiamagurdni Smith, when the task is elementary. This can be done semi-automatically now. The problem is the multiple David Johnsons.

> The reason why I politely spurned the David Johnsons experiment was not
> because there are a lot of David Johnsons and this would be
> computationally hard (what do I care how hard it is? it's a computer's
> job, not mine), but because there are a lot of David Johnson books and I
> don't want to spend weeks scanning books, purely to show off
> computational work which is already documented in the scientific
> literature.

The question is not how hard it is, the question is: can the computer do it at all correctly? I haven't seen it. Certainly, it can do the computations and spit out an answer, but is it correct?  If the computer can do only the easy cases for single names with no conflicts, how does that help anything at all in reality? That's why I added the example of the optical character recognition text from Boswell's life of Johnson--the system didn't work. This is an example of something that is said to have worked at a level of 98% or so level of accuracy. I have reviewed a lot of OCR, and I question that level of accuracy in the real world. To me, it is primarily a marketing ploy built on theoretical constructs. And this is one of those example where it's not at all simple for the user to see. If they saw it, they may look at the results of their keyword searches a bit more skeptically.

In my experience, determining which David Johnson, or which branch of the Russian Academy of Sciences is responsible for an item is a very complex task, probably more difficult than determining that a certain arrangement of dots is an e and not a c in optical character recognition. This is why I am skeptical.

It may be a case of the computer does all these computations and it goes to a human editor who makes the final decision. Again, how does this help if the human has to do the work anyway?

Regards,
Jim Weinheimer