Re: Purposes of classification & Information imperialism

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Mon, 11 Jun 2007 13:12:02 -0400
To: NGC4LIB_at_listserv.nd.edu
Perhaps there is a way to automate it, to get something that's better
than nothing.

For instance, ClassificationWeb has a statistically generated mapping
from LCSH to both LCC and DDC.  DDC, as we know, has a very complete (if
not perfect) hiearchy. LCC has some hiearchy (although it's not
neccesarily expressed in the class number, it's still _there_, you just
need the schedules to see it, you can't guess it from the class number).

So I can imagine some software that takes the LCSH, looks at it's
corresponding DDC, looks at the DDC class numbers  'parent', looks in
the other direction at the LCSH corresponding to that parent
DDC---bingo, you've established a hiearchical relationship. You run that
process over your whole corpus, you build a tree. Now, it's a heuristic
guess based on statistical correlations, not an exact guaranteed thing
that will make sense.  But that's just a start. Perhaps an algorithm
could be written to take account of both the DDC _and_ the LCC
statistical correlations, and doing other fancy things.

Whatever you end up will STILL just be an algorithmic heuristic guess.
But I think it would be a good one that could let us make interfaces of
use to our users. And there's no reason what you come up with as a start
couldn't be fixed by hand, as librarians run into things that need
fixing/improving.

What would we need to try this out? Well, in addition to someone with
the time to try it out (funding; resources allocated to research), that
person needs access to: machine readable/parseable LCC schedules;
machine readable/parseable DDC schedules; the statistical correlations
in ClassificationWeb in machine readable/parseable format.  I don't
believe any of us 'ordinary' librarians and library workers have access
to those at any reasonable price.

Once you had this, the idea of librarians little by little
improving/fixing it---requires a better/cheaper/easier infrastructure
for collaborative data editing than we have now.

One philosophy that underlies this whole proposal:  "Good enough" does
not always mean "perfect".  Settling for nothing rather than "perfect"
is not always the right solution. Requiring perfect means you sometimes
get very little, when it would be better to have more that's not perfect
but is good enough. Of course, the trick is deciding/agreeing how good
is good enough. But this applies to whole plan, from the heuristic
algorithms based on statistical correlation, to the idea that it should
be easier and cheaper for more librarians to participate in
collaborative data correction/improvement.

Jonathan

David M Guion DMGUION wrote:
> Jonathan Rochkind said:
>
>
>> I think we need to investigate other ways to display these more specific
>> headings, not just a flat alphabetical list of 500 'more specific terms'
>>
>
> Many, many moons ago, I worked in a dental school library and used MeSH.
> That had a "tree structure" volume (I told you it was many, many moons
> ago--online was not yet thought of) that, if I recall correctly, placed
> every heading somewhere on a top-to bottom hierarchy. I have ever since
> wished LCSH were organized that way.
>
> Here we are talking about next-generation catalogs, when there is a
> thirty-year-old example of how to structure subject headings that runs
> circles around what most of us have to use now. Alas, following that model
> would seem to entail redesiging the subject system from the ground up,
> which would cost gigabucks to do and more gigabucks to teach everyone how
> to use it.
>
> ^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*
> David Guion
> Music Cataloger
> University of North Carolina, Greensboro
> Jackson Library
> 320 College Ave.
> Greensboro, NC   27412
> (336) 334-5781
> dmguion_at_uncg.edu
>
> The early bird may get the worm, but the second mouse gets the cheese.
>
>

--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
Received on Mon Jun 11 2007 - 11:15:38 EDT