Re: Purposes of classification & Information imperialism

From: Ted P Gemberling <tgemberl_at_nyob> Date: Fri, 8 Jun 2007 15:24:23 -0500 To: NGC4LIB_at_listserv.nd.edu

Jonathan,
At this point, I don't think anyone is stereotyping your views as "all
you need is Google." I think most of us on the list know that you're not
pushing for that sort of over-simplification. You recognize the value of
controlled vocabulary.

You said that Nathan's list is way too long for most users. But as I
think he was suggesting in response, should our catalogs be designed for
beginning users or "the most demanding [or advanced] users"? The latter
are the people we'd like the beginning users to grow into eventually, at
least in academic libraries. But I realize you're making another point:
that for a lot of users, a list like that might be bewildering in its
complexity. It seems that's the role of the reference librarian, though,
to help people work their way through confusing things. I know you're
not saying everything should be in the reach of a Google search box, but
there does still seem to be something of an assumption that people
should be able to figure everything out from their desktop. They
shouldn't have to ask anyone for help.

There was another point I thought I should get into related to
Bernhard's posts. It seems there is a sort of "information imperialism"
in expecting the whole world to do things the way LC does them. And
sometimes I wonder if libraries around the world adopt LC practices, not
because they're better, but simply because of the magnitude of LC, its
systems and collections, and worldwide influence. At some point they
just decide the easiest thing to do is "go with the trend" and adopt
LC's system. It's unfortunate if a lot of work they did before on other
systems is lost that way. Hopefully, some of it will be recouped later
when adjustments to the dominant system have to be made, by bringing in
their insights.

This does kind of relate, in a way, to Nathan's point, too. (I don't
mean to imply Nathan would agree with this, but I think it's related.) I
think there is a legitimate question "why reinvent the wheel?" if
someone else has already created a system that provides sophisticated
access. Especially if it will be more expensive for you to create
something else. Why create a new classification system or subject
heading system if there are already some very full ones? That is, unless
you really want something that costs less. I'm not saying cost is your
motivation, Jonathan, though I wonder if it is for some who call for new
systems. Some people really would like to base it all around that Google
box, or something similar to it.
        --Ted Gemberling

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind
Sent: Thursday, June 07, 2007 10:33 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Purposes of classification (was Re: [NGC4LIB]
Aristotle, "Everything is Miscellaneous", and the lib's "educative
function" )

PS: I'd add that that list from Nathan is actually a good example of
that I'm talking about. That list (which is just an excerpt of
_highlights_) is WAY too long for most users. It's huge. And some of
them are at very different levels of granularity than others.  I would
like my controlled vocabulary to show the relations between those terms,
the groupings they fall into, etc. If I don't know what one of them
means, those kinds of features can easily and quickly help me put the
term in context.

And I believe that list is only of _subdivisions_ of Linguistics
(although for some reason the listing has omitted the conventional "--"
seperators, making the actual semantic content even more confusing). .
That's different than 'narrower terms'. I don't believe I have currently
have access to any interface that lets me find any "NT"s of
"Linguistics", OR any of those subdivisions. Mapping them all out would
probably give us a quite crazy graph.  Not to mention the well-loved "&"
LCSH's, which are not in fact subdivisions, like "Linguistics &
philosophy".  Oops, but sometimes it's "and" instead of "&":
"Linguistics and communism"

Those headings were designed to be viewed in a single alphabetic list.
When that was the only way to present headings, they had to be designed
that way, and they do a good job of it. It's not the only way to present
headings anymore, but to do something different with LCSH is like
fighting with LCSH.

PPS: If you think my point is "So all we need is Google", you are
missing it. That's not at all what I'm saying. We need a structured
controlled indexing language, professionally maintained and applied (_in
addition to_ algorithmic analysis of full text and metadata AND
"tagging").  But we need to start analyzing WHAT we need from an
indexing language (some of hwich we've got, some of which we don't, and
it's really a matter of degree anyway, not on or off), WHY we need those
things (what features we want to support), and HOW to get there.  Not
just defending LCSH and LCC and DDC as if they were sent down from
heaven or something.   Even Ranganthan isn't actually God.  We are
allowed to critically examine these things, 100 years after they were
founded, for a VERY different environment than the one we are now in.
That doesn't mean we need to throw the baby out with the bathwater. But
let's not leave the baby soaking in 100 year old fetid bathwater either,
eh?

Jonathan

Jonathan Rochkind wrote:
> Certainly LCSH allows one to do all sorts of things, especially if the
> system supports it wisely.
>
> I think the lack of a good hiearchical structure gets in the way of
> doing a bunch of things with LCSH though. I believe someone wrote
> rerecently that LCSH, even if you pay attention to the BT/NT
'thesaural'
> relationships (which most of our systems don't, and which they would
> need access to full LCSH authority MARC to do), that there are around
> 30,000 "top level" terms. On top of this, we have hiearchy implemented
> in several different ways in LCSH (thesaural relationsihps,
> subdivisions, alphabetic proximity, inverse phrasing).
>
> This makes it difficult to provide the 'lay of the land' view, or the
> segmenting of a large result set. I want to, for instance, have my
> system have this kind of a 'dialog' with the user: Gee, you got 500
> results for 'apache', 200 of them are about native americans, 100 of
> them are about military helicopters, and 100 of them are about web
> servers. Oh, you're intersted in military helicopters? Okay, within
> that, some of them are about engineering, some of them are about
> military history. Oh, you're interested in military history? Some of
> those are about Vietnam, some of those are about... whatever. How can
I
> do this with LCSH? I can try, but the lack of a consistent and
> systematic hiearchical structure (with a more reasonable number of top
> level term--I don't know that they need to fit on 'two pages', the
> number of printed pages they fit on is irrelvant; what 'reasonable' is
> is yet to be determined, but I know it's not 30000) gets in the way.
>
> To be sure, LCSH can be used in all sorts of ways. I'm not trying to
say
> it's garbage or something. I am, however, interested in explroing
both:
> 1) What can we make our systems do with LCSH (and with DDC, and with
> LCC, and with anyting else we've already got a lot of assignments for)
> beyond what we are currently doing, to aid the user?
> 2) What walls do we hit when we try to do some of this stuff, what do
we
> wish we could do better than these existing vocabulary/indexing
systems
> will let us do---and WHY, and what can be done (on a theoretical and
> practical) level to ameliorate this?
>
> I think there are useful things that can be done in these directions,
> that you only can start when you stop defensively claiming that LCSH
(or
> LCC or DDC) are somehow perfect systems which can do everything just
> perfectly.
>
> Jonathan
>
> Jonathan
>
> Rinne, Nathan (ESC) wrote:
>> Jonathan:
>>
>> Jonathan said:
>>
>> "but in fact, we want and NEED a classification (NOT just tagging,
but
>> a:
>> _controlled_ vocabulary; of subject, disciplinary, and genre
>> characteristics; with relations between terms of hiearchy,
association,
>> and possibly other relation types---that is, a classification)--for
>> reasons other than shelf order. These reasons include but are not
>> limited to:
>> * Bringing like things together in multiple ways in a interface that
is
>> not the shelf.
>> * Allowing people to understand what is in a large corpus, or large
>> result set, by categorizing it in sets--to get a 'lay of the land'.
>> *  To find more things like a thing already found
>> * To narrow or broaden one's search when one realizes that one needs
>> more focused or more general materials."
>>
>> Jonathan, of course all the good points that you make here are
already
>> things that LCSH allows researchers looking for substantial materials
to
>> do (with non-fiction *books*, not individual articles).  It may not
>> allow them to do this to the extent that we want (i.e. all at once on
>> the screen) and many researchers may not know how to do it - but all
of
>> this can already be done using good library catalogs:
>>
>> * LCSH allows things to be explored (by clicking from here, to there,
to
>> there) in multiple ways in an interface that is not the shelf.  There
is
>> a holistic matrix of interconnectivity and interdisciplinarity that
>> already exists
>>
>> (for example, with "linguistics" in LC's catalog, I can find books
also
>> about "World politics--1989-", "ontology", "English language--Old
>> English, ca. 450-1100", "Home and school", "Arabic language--Social
>> aspects", "Oral tradition", "Science", "Antiquities", "Truthfulness
and
>> falsehood", "Literature--History and criticism", "African languages",
>> "Language and culture", "Aesthetics", "Metamathematics", "Logic,
>> Symbolic and mathematical", "Archaeology", "Physical anthropology",
>> "Chomsky, Noam", etc.)
>>
>> * LCSH browse displays alphabetically categorize subject headings and
>> allow people to see what is available in a certain disciplinary area,
>> subject, or topic - to get a "lay of the land"
>>
>> (Highlights I was able to quickly get from LC browse list:
>> Linguistics
>> Linguistics Bibliography. (78 hits)
>> Linguistics China. (30 hits)
>> Linguistics Congresses. (410 hits)
>> Linguistics Dictionaries. (57 hits)
>> Linguistics Dictionaries Arabic. (7 hits)
>> Linguistics, Experimental (10 hits)
>> Linguistics Field work (11 hits)
>> Linguistics Germany History (7 hits)
>> Linguistics Handbooks, manuals, etc. (12 hits)
>> Linguistics Historiography. (19 hits)
>> Linguistics History (166 hits)
>> Linguistics History 19th century (32 hits)
>> Linguistics History 20th century (51 hits)
>> Linguistics Methodology (104 hits)
>> Linguistics Methodology Handbooks, manuals, etc. (10 hits)
>> Linguistics Periodicals. (310 hits)
>> Linguistics Philosophy. (43 hits)
>> Linguistics Problems, exercises, etc. (15 hits)
>> Linguistics Research Hungary History. (5 hits)
>> Linguistics Research Soviet Union (15 hits)
>> Linguistics Statistical methods (51 hits)
>> Linguistics Terminology. (50 hits)
>>
>> Note: Google might be interested in "Linguistics Statistical
methods",
>> you think?  Also, as Thomas Mann has noted, it would not be hard to
put
>> popular and substantial web resources [or things like blogs even] in
>> this list)
>>
>> * obviously, LCSH allow a person to find more things like a thing
>> already found - and tags and user recommendations would only
*increase*
>> the possibilities - even for research work, increasingly
>> interdisciplinary as it is.
>>
>> * In a good catalog, the search can be narrowed by clicking on the
>> subject headings in the browse list.
>>
>> (For example, click on "linguistics" in the LC's catalog and you get
the
>> following narrower terms:
>>
>> Acceptability (Linguistics)
>> Analogy (Linguistics)
>> Anaphora (Linguistics)
>> Anthropological linguistics
>> Applied linguistics
>> Archaisms (Linguistics)
>> Areal linguistics
>> Asymmetry (Linguistics)
>> Binary principle (Linguistics)
>> Biolinguistics
>> Classification Books Linguistics
>> Classifiers (Linguistics)
>> Code switching (Linguistics)
>> Communism and linguistics
>> Componential analysis (Linguistics)
>> Connotation (Linguistics)
>> Context (Linguistics)
>> Contrastive linguistics
>> Creativity (Linguistics)
>> Deep structure (Linguistics)
>> Diglossia (Linguistics)
>> Distinctive features (Linguistics)
>> Economy (Linguistics)
>> Emphasis (Linguistics)
>> Field theory (Linguistics)
>> Formalization (Linguistics)
>> Functionalism (Linguistics)
>> Grammar, Comparative and general
>> Grammaticality (Linguistics)
>> Graphemics
>> Hesitation form (Linguistics)
>> Hierarchy (Linguistics)
>> Historical linguistics
>> Idioms
>> Juncture (Linguistics)
>> Linguistic models
>> Markedness (Linguistics)
>> Mathematical linguistics
>> Minimal pair (Linguistics)
>> Modality (Linguistics)
>> Neurolinguistics
>> Neutralization (Linguistics)
>> Paralinguistics
>> Parallelism (Linguistics)
>> Phonetics
>> Prosodic analysis (Linguistics)
>> Psycholinguistics
>> Redundancy (Linguistics)
>> Reference (Linguistics)
>> Register (Linguistics)
>> Sociolinguistics
>> Speech acts (Linguistics)
>> Structural linguistics
>> Substratum (Linguistics)
>> Surface structure (Linguistics)
>> Transmutation (Linguistics)
>> Typology (Linguistics)
>> Universals (Linguistics)
>> Word (Linguistics)
>> Government-binding theory (Linguistics)
>> Cohesion (Linguistics)
>> Autosegmental theory (Linguistics)
>> Definiteness (Linguistics)
>> Naturalness (Linguistics)
>> Pejoration (Linguistics)
>> Paradigm (Linguistics)
>> Genericalness (Linguistics)
>> Forensic linguistics
>> Iconicity (Linguistics)
>> Scope (Linguistics)
>> Ecolinguistics
>> Sequence (Linguistics)
>> Perspective (Linguistics)
>> Fossilization (Linguistics)
>> Motion in language
>> Direction in language
>> Politeness (Linguistics)
>> Subjectivity (Linguistics)
>> Opacity (Linguistics)
>> Gradience (Linguistics)
>>
>> The "broader term", Language and Languages, is currently unavailable
>> [needs more funding]...)
>>
>> Obviously, picking linguistics also helps me show the importance of
>> having some highly trained people doing cataloging work for this or
that
>> disciplinary niche.  Certainly, there are many subjects that have
more
>> "popular" narrower terms, for example, as well!
>>
>> Therefore, these incredible services are all available, to some
extent,
>> now.  As it stands though, perhaps it takes curious people who are
>> former detectives (like Thomas Mann at the LOC) - and perhaps those
with
>> a solid liberal arts education - to really utilize them to their
fullest
>> extent.  Things can almost certainly be made much easier, as Andrew
Pace
>> and Eric Hatcher have shown.  Perhaps also with changes to MARC
format.
>>
>>
>> Of course, in order to make this work, I think we need more quality
>> tagging not just from users, but catalogers (LCSH) as well.
>>
>> Nathan Rinne
>> Media Cataloging Technician
>> ISD 279 - Educational Service Center (ESC)
>> 11200 93rd Ave. North
>> Maple Grove, MN. 55369
>> Work phone: 763-391-7183
>>
>> -----Original Message-----
>> From: Next generation catalogs for libraries
>> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Jonathan Rochkind
>> Sent: Wednesday, June 06, 2007 9:42 PM
>> To: NGC4LIB_at_listserv.nd.edu
>> Subject: [NGC4LIB] Purposes of classification (was Re: [NGC4LIB]
>> Aristotle, "Everything is Miscellaneous", and the lib's "educative
>> function" )
>>
>> What are the points of a classification? I submit that there are
>> several. And only ONE of them requires the kind of compact notation
that
>> Bernhard assumes--shelf order. Certainly, as long as we need to put
>> books on shelves (and I think this will be for a long long time) we
will
>> need a shelf order. and so long as we need a shelf order, it serves
us
>> well to put like books together (recognizing that books can be like
and
>> unlike in many differnet ways, along many differnet axes--but we
still
>> need to pick just one for a shelf order. Just because that's the way
the
>> physical world works, no problem).
>>
>> This is all true, and just the way it is.
>>
>> but in fact, we want and NEED a classification (NOT just tagging, but
a:
>> _controlled_ vocabulary; of subject, disciplinary, and genre
>> characteristics; with relations between terms of hiearchy,
association,
>> and possibly other relation types---that is, a classification)--for
>> reasons other than shelf order. These reasons include but are not
>> limited to:
>> * Bringing like things together in multiple ways in a interface that
is
>> not the shelf.
>> * Allowing people to understand what is in a large corpus, or large
>> result set, by categorizing it in sets--to get a 'lay of the land'.
>> *  To find more things like a thing already found
>> * To narrow or broaden one's search when one realizes that one needs
>> more focused or more general materials.
>>
>> None of these purposes, in and of themselves,  in fact require a
>> notation suitable for shelf ordering. What DO they require,
especially
>> when we are trying to fulfill these purposes in a digital interface?
>> What sorts of interfaces might we want to present to users, and what
>> _formal features_ of a classificatory controlled vocabulary assist or
>> get in the way of providing them?
>>
>> This is what we need to discover, by emperical research as well as
>> intellecutal analysis.
>>
>> LCC or Dewey (or LCSH) are not the end of hte road. They are the
>> beginning. They were designed for an environment we are no longer
>> constrained to. We can do more. What can we do that's more with these
>> existing systems? What might we WISH to do, but these existing
systems
>> wont' let us do in a reasonable systmetic way (becuase if you aren't
>> oging ot be systematic and reasonagbly consistent--then you might as
>> well just be using tagging, indeed)?  This is what we need to
discover.
>>
>> Jonathan
>>
>> Bernhard Eversberg wrote:
>>
>>> Tim Spalding wrote:
>>>
>>>> 1. The organization into tens is arbitrary and limiting. The "tree
of
>>>> knowledge" (if there is a tree) is on no better terms with ten than
>>>> time is with twelve. These are arbitrary; Dewey uses tens to make
>>>> numbers shorter and nothing else. Every level has a choice,
>>>> Procrustean hilarity.
>>>>
>>>>
>>> So, what might be a good number for the first level of a new
>>> classification? If we agree, that is, that we need a new one.
>>> Under 25? Then we might use a letter for a code.
>>> More, up to 100 perhaps? Then a 2-digit-number might be appropriate.
>>> (In the Netherlands and in Germany, the Dutch "Basisclassificatie"
>>> is widely used. It has 89 main classes, each with less than 100
>>> subclasses. Notations thus look like this:
>>>   54.72 Artificial Intelligence (54 = Computer Science)
>>> (The level of detail is of course much less than Dewey, but its aim
>>> is not to replace Dewey but to provide a broad categorization. It
>>> can be useful to refine keyword searches or to arrange large
>>> result sets into manageable chapters. The aim is not to sort the
>>> world out but to arrange sets of documents.)
>>>
>>> With this question sorted out, then what headlines (broad subject
>>> categories) might be appropriate for our time and age?
>>>
>>>
>>> I mean, why not take this on now and make an attempt to define
>>> at least the top level of a new classification - if all existing
>>> systems are as deficient as they appear to be. There must be
>>> some approaches somewhere already - maybe even a good one.
>>> Below the top level, there may be existing subject classifications
>>> that could be re-used here. At least in some subjects, like
>>> mathematics or physics.
>>>
>>> B. Eversberg
>>>
>>>
>>
>> --
>> Jonathan Rochkind
>> Sr. Programmer/Analyst
>> The Sheridan Libraries
>> Johns Hopkins University
>> 410.516.8886
>> rochkind (at) jhu.edu
>>
>>
>
> --
> Jonathan Rochkind
> Sr. Programmer/Analyst
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>

--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu