Resignation

From: Rinne, Nathan (ESC) <RinneN_at_nyob> Date: Tue, 4 Sep 2007 10:21:06 -0500 To: NGC4LIB_at_listserv.nd.edu

James Weinheimer:

It may be a case of the computer does all these computations and it goes
to a human editor who makes the final decision. Again, how does this
help if the human has to do the work anyway? (end)

Jim, I think this comes back to my post asking how far apart you and
Jonathan were in your views.  Isn't there a possible time-saving element
here that could be acknowledged?  I.e., the computer - based on its
particular abilities - might be able to spit out some pretty good
guesses - which the cataloger would be able to check and verify?  I'm
really not eager to spawn whole new threads of what increasingly looks
to me like "talking past each other", but I think your further thoughts
here might be valuable and illuminating.

I'd say you've demonstrated great thoughtfulness with your content-laden
replies and probing questions - and I hope others feel the same.  Thank
you.

Also, I am still puzzled as to why the potency and importance of the
"David Johnson" example and my research suggestion is evidently so
difficult to grasp.  To me, it seems this would include a crucial
necessary control element - allowing for the production of concrete,
measurable data in the study which could be examined by parties that
don't have such a stake in its results.

Alexander, when I say that in order to do such a study one would have to
rely on proven authority work, I think you misunderstand me.  By saying
that I am *not saying* we would I am not saying that we in the LIS
profession should not take what we've done so far and push it further -
I don't dispute this.  I am simply stating that in order to get started
with such a study (which again, would be able to test the abilities of
certain computer softwares to accurately distinguish many authors with
the same name), one would need to get a fairly good representation of
the works by the David Johnson's out there, and therefore would need to
use something like OCLC WorldCAT, with its accumulation of
human-produced authority records.  As I've argued before, I think this
kind of stuff is kind of the core of the library profession, and I also
think that in the future, its most useful services are likely to derive
from this sort of work.

Also, in the flurry of postings, I apologize that I must have missed the
"paper [you] pointed to which did auto-classification across Project
Gutenberg [and] chose LCSH specifically so that librarians could verify
the
result."  Do you happen to have a ref or link to that paper?

Rob Styles, in another post, says "When we go to the doctor we trust
them to understand our problem and prescribe the appropriate treatment.
If we find we no longer trust the doctor's judgement then we find a new
doctor. Surely the same should be true of any expert we employ?" (end)

This issue of trust is so interesting.  While not discounting the point
Rob was making here about the importance of acknowledging person's
perceptions, I find it very interesting to note that I may not trust a
certain doctor, but in reality, he might be far more trustworthy than
the one I feel drawn to for whatever reason.

I don't have an unshakeable "faith" in things like LCSH.  I just think
that they have proven themselves to be very helpful in navigating the
world out there - even if the quality help they do deliver has to this
point only been accessible to great reference libs like Thomas Mann (who
deals primarily with the humanities and social sciences) - and I haven't
been able to do so much of the best searching myself.

I do want to be involved in "future-oriented thinking" as Karen Coyle
(by the way, I hope you will "waste [more] time" here Karen) says - but
maybe there are some who are "doubting Thomases" about that one too. :)

But how could I prove it to you?  :)

Regards,
Nathan Rinne
Media Cataloging Technician
ISD 279 - Educational Service Center (ESC)
11200 93rd Ave. North
Maple Grove, MN. 55369
Work phone: 763-391-7183

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Weinheimer Jim
Sent: Tuesday, September 04, 2007 2:25 AM
To: NGC4LIB_at_listserv.nd.edu
Subject: Re: [NGC4LIB] Resignation

> On Sat, 2007-09-01, Conal Tuohy wrote:

> Any system which has to guess the authorship of works will be liable
to
> error (whether the system is human or artificial). If an author's name
> is common, the likelihood of error will be higher (again, this is true
> for humans as well as for computers). So differentiating the works of
> multiple David Johnsons will of course be harder than differentiating
> the works of multiple Jiamagurdni Smiths. I don't think anyone could
> dispute that, and it has nothing to do with the reason why I suggested
> forgetting David Johnson.

I was trying to say that there will be a single Jiamagurdni Smith, when
the task is elementary. This can be done semi-automatically now. The
problem is the multiple David Johnsons.

> The reason why I politely spurned the David Johnsons experiment was
not
> because there are a lot of David Johnsons and this would be
> computationally hard (what do I care how hard it is? it's a computer's
> job, not mine), but because there are a lot of David Johnson books and
I
> don't want to spend weeks scanning books, purely to show off
> computational work which is already documented in the scientific
> literature.

The question is not how hard it is, the question is: can the computer do
it at all correctly? I haven't seen it. Certainly, it can do the
computations and spit out an answer, but is it correct?  If the computer
can do only the easy cases for single names with no conflicts, how does
that help anything at all in reality? That's why I added the example of
the optical character recognition text from Boswell's life of
Johnson--the system didn't work. This is an example of something that is
said to have worked at a level of 98% or so level of accuracy. I have
reviewed a lot of OCR, and I question that level of accuracy in the real
world. To me, it is primarily a marketing ploy built on theoretical
constructs. And this is one of those example where it's not at all
simple for the user to see. If they saw it, they may look at the results
of their keyword searches a bit more skeptically.

In my experience, determining which David Johnson, or which branch of
the Russian Academy of Sciences is responsible for an item is a very
complex task, probably more difficult than determining that a certain
arrangement of dots is an e and not a c in optical character
recognition. This is why I am skeptical.

It may be a case of the computer does all these computations and it goes
to a human editor who makes the final decision. Again, how does this
help if the human has to do the work anyway?

Regards,
Jim Weinheimer