Re: Resignation

From: Mark Ehlert <ehler043_at_nyob>
Date: Fri, 31 Aug 2007 10:45:44 -0500
To: NGC4LIB_at_listserv.nd.edu
Nathan Rinne wrote:
>In order to help along the "doubting Thomases" among the catalogers, let
>me make a plea.  I think it should be super, super easy to do.  Why
>doesn't someone start with Conal Tuohy's claim about our current
>capabilities (using Bayesian statistics) re: all of the David Johnsons
>James Weinheimer informed us of?  I know something about science and
>research, so this ought to be easy enough to empirically test.  First,
>get all the works (only text, I assume?) of all the David Johnsons.  Of
>course, *ironically* [note: this is an addition to this quote] *in order
>to even get started here* I don't see how you would be able to avoid
>needing to use something like OCLC's Worldcat (made possible with its
>wonderful authority control, thank you!) in order to find most, if not
>all, of these works.  Then all you would need to scan them and do the
>test, and find out if it worked or not.  I think this would be very,
>very helpful - and I want help.  Does anyone have the means of doing
>this? (end)

Nathan,

I can issues coming to the fore in the very beginning stages of
implementing such a case study.  First, a series of like-named
authors would have to have available for use works not covered by
copyright law, since the mere act of copying a whole work even for
research purposes may be in violation of it.  (I claim I'm not a
lawyer, merely ruminating briefly on what Google has (and is?) going
through with publishers now, viz. the former's Books project.)

Second, and touching upon the OCR topic brought up a while ago, is
the current state of finding a very high-quality text reader
(scanner) available for such a project.  Like astronomers with large
telescope, those who need such equipment have to stand in line.

Nota bene that I base the above on the presumption that textual
material in book form is the source for this project.

A possibly better start, which would take quite a bit of work to
accomplish in its own right, is the rounding up of texts (of
significant length) by like-names authors already digitized and
confirmed to be accurate or damn near accurate in transcription, such
as may be found at Gutenberg or an institution's TEI work.  There's
also born digital texts our there in open archives that may be
candidates for this as well, but finding a large enough same to make
this worthwhile is something I'm unsure about.

Quickly written; pardon any semantic bumps in the road,
Mark


--
Mark K. Ehlert           University of Minnesota Libraries
Library Assistant 2      160 Wilson Library
Technical Services       309 19th Ave. S.
Phone: 612-625-4840      Minneapolis, MN 55455
E-mail: ehler043_at_umn.edu
Received on Fri Aug 31 2007 - 11:45:44 EDT