Jonathan Rochkind wrote:
I'm also interested in someone exploring what makes a controlled
vocabulary suited for the kind of facetted exploration we are talking
about...
...The FAST project is to some extent an attempt to make LCSH more
amenable to this type of interface, but to my mind it's only the barest
beginning of what's possible. It's not backed by much empirical
research. It loses some information from LCSH that I'm not convinced you
_need_ to lose (is it possible to keep this information intact and even
present it in an easy to use manner?). And it's generally presented as a
project to "make subject cataloging easier for paraprofessionals",
rather than "make subject controlled vocabulary work better in easier to
use interfaces." It's the latter I'm interested in, not the former. I
think a lot more than FAST, and not neccesarily along the lines of FAST
either, is possible.
***
I agree with Jonathan that it would be good to see more discussion of
ways that LCSH could be modified to work better with facet-based access.
I think this could potentially be a very effective and useful
development, but it could get very complicated quickly. Anyway, here are
a few thoughts, in no particular order, that come to my mind when
thinking about LCSH and things that might effect our ability to provide
useful facets.
I don't know of much work being done on this (other than FAST, which as
Jonathan pointed out, has a different purpose), but there is an
interesting article by James D. Anderson and Melissa A. Hofmann (in
Cataloging & Classification Quarterly v. 43, no. 1, p. 7-38) about
faceting existing LCSH headings, which seems to primarily consist of
assigning existing LCSH terms to more specific facets, but provides some
nice theoretical background and a little history of faceting in library
classification. They put a lot of emphasis on the syntax of the
different facets (not just having one topical facet like LCSH, but a
number of them including thing/entity, kind, part, client, product,
agent/means, etc.). This seems somewhat different from the kind of
faceting Endeca, etc. are doing.
I do think FAST does some things that are improvements not only for
input, but for access. For example, FAST provides hierarchical access to
the geographic facet and uses exact dates for the time facet. One of the
weaknesses of our current system is that it is hard to move up and down
levels of specificity because the syndetic structure is incomplete, the
information is not always coded with enough granularity, and our systems
don't support this type of navigation. I might want to search for
everything about communicable diseases in Kenya or AIDS in Sub-Saharan
Africa, but for either of those searches, it's hard to see how one
could do a search that wouldn't be labor-intensive and involve manually
looking for a lot of narrower terms and potential combinations. If
people start at a certain basic level, how do you help them easily move
up and down the chain of specificity to find what they need?
Anderson and Hofmann make the very good point that, although LCSH is
supposed to provide subject analysis that is coextensive with the topic
of the work, this is undermined by the use of more than one heading to
approximate the coextensive subject. This scattering effect is
exacerbated when you have multiple works on one record. One problem we
have had recently is not being able to teach a computer to correlate
instrumentation with genre in some cases (e.g., subjects like "Sonatas
(Violin and piano)" vs. the combination of "Waltzes" and "Violin and
piano music")
Casey gave some good examples of compound headings like "Cookery,
French" and "Painting, French" that combine two different topics when it
might be desirable to be able to facet them separately. I have tended to
think of this in terms of headings on individual biographies like
"African American women poets" which are supposed to be accompanied by a
more general term (H 1330 in LC's subject cataloging manual), but those
instructions are often overlooked so you end up with those biographies
being segregated out from the mass of American poets (at least in a
browse list where they are under "Poets, American"), and certainly from
the broader yet term "Authors, American." I sometimes think some kind of
factoring out of the parts of a heading to their most specific parts
(e.g., African Americans + women + poets) which could have broader and
narrower terms for each individual bit, could be a lot more flexible and
you could have any combination of terms you wanted. A disadvantage of
this sort of faceting, as Martha Yee pointed out in her article "Two
Genre and Form Lists for Moving Image and Broadcast Materials: A
Comparison" (Cataloging & Classification Quarterly, v. 31, no. 3/4, p.
237-295), is that it can result in artificial terms (e.g.
"Gangster-Feature") that don't resemble users' vocabulary. However, I
think it would be possible to map individual factors to phrases that are
more likely to be known to users.
Implied facets are another stumbling block, but are hard to ferret out
unless you stumble across them and are often mishandled because
catalogers overlook the instructions. One I ran across recently is the
fact that Germany is not to be used under "National socialism."
"National socialism" = Nazism in general and Nazism in Germany as a
whole, while "National socialism-Germany" is only supposed to be used
when it's further subdivided by a more specific place such as Berlin. Or
the geographical subdivision for United States is implied in the heading
"African Americans." There actually is a fair amount of "implied" data
in catalog records that works against us in a computer-based searching
environment.
I think there could also be a place for broad-bucket headings for things
like major literary forms. If you look at something like NCSU's Endeca
catalog, many searches result in a form facet like "Fiction" or
"Poetry," which is misleading because they don't include all the fiction
or poetry, only those (mostly newer fiction) that happen to have some
topical heading subdivided by "Fiction" or "Poetry."
Specificity of subject headings is potentially a great strength of LCSH,
but in our current browse lists it often leads to a frustrating amount
of fragmentation. Our local public library used to offer only a subject
keyword search that included the x-refs from the authority file (good),
but would only give you back a list of headings (sometimes bad, like
when I just wanted to see the most recent plumbing books without having
to go into each individual variation, such as "Plumbing,"
"Plumbing-Amateurs' manuals" and "Plumbing-Repair," and sort by date).
Now they've got the opposite approach, where you can only keyword search
the authorized subjects in the bibliographic records, but totally miss
any x-refs.
As far as broader terms and narrower terms go, I remember some exercise
in library school where we had to trace a term (some abstract noun), up
though its broader terms only to once again arrive where we began. I
don't know if that's been fixed, but there are probably more than a few
kinks in LCSH's BT/NT system.
There are a lot of patterns that are inconsistent and thus ambiguous to
computer interpretation. One recently discussed on the cataloging list
AUTOCAT is the question of what is going on with "Scottish literature"
vs. "English literature-Scottish authors" vs. "American literature" (not
to mention "Canadian literature" = Canadian literature in general and in
English vs. "French-Canadian literature" = Canadian literature in French
vs. "Nigerian literature (English)")
Relationships between terms are often important. Cookbooks (or cookery)
for children to use (currently "Cookery-Juvenile literature") are
different from cookbooks about cooking for children (which as best I can
determine is currently represented by the combination of
"Children-Nutrition" and "Cookery.") even if the two broad topical
keywords that might apply to both are cookbooks and children.
There are also situations, particularly with geographical subdivisions,
where the relationship between the main heading and the subdivision is
ambiguous. An example of this is a novel set in Jamaica about an English
detective which might have the following two subject headings:
1. "Detectives-England" (Here meaning English detectives; I once read a
claim that LCSH uses geographic subdivision quite often after classes of
persons when what it really means is people of a given nationality. You
can see this in structures like "Painters-United States-Biography" vs.
"Authors, American-Biography." Although I can see where you could get
into trouble trying to rigorously enforce the distinction.)
2. "Detectives-Jamaica" (Here meaning detectives in Jamaica)
Does "Exports-Japan" mean exports to or exports from? You can't get
things like Arab prisoners of war of the U.S. at Guantanamo Bay (or
British prisoners of the Japanese in China during WWII for that matter)
all in one string because you've got too many geographic facets (or what
get turned into geographic facets in LCSH; nations as agent in Anderson
and Hofmann's analysis would be in another facet)
One thing I have also heard suggested, which might be helpful in
faceting, is to mark subject headings as belonging to certain groups,
such as classes of person (helpful with validating free-floating
subdivisions used only under certain types of headings if nothing else)
or by broad subject area (e.g., topics that are inherently legal or
economic in nature). These two things might possibly be fairly reliably
extracted on a large scale from patterns of use and correspondence
between subject headings and classification numbers.
Anyway, I'm afraid this email is a bit of a jumble, but I hope that it
suggests some possible areas for potential examination.
-------------------------------------
Kelley McGrath
Cataloging & Metadata Services Librarian (Audiovisual)
Bracken Library
Ball State University
Muncie, IN 47306-0161
Phone: (765) 285-3350
kmcgrath_at_bsu.edu
Received on Mon May 14 2007 - 15:34:24 EDT