Re: The problem with OPACs [was: New subject keyword search]

From: McGrath, Kelley C. <kmcgrath_at_nyob> Date: Mon, 30 Jul 2007 08:17:47 -0400 To: NGC4LIB_at_listserv.nd.edu

[Mark K. Ehlert] wrote earlier:
>The details escape me now, but LC will begin (or has begun) 
>establishing more subdivided headings en masse, thus perhaps helping
>out in this department [i.e., indexing/searching LCSH headings with
>subdivisions].

Here is a link (PDF file) to the announcement I made vague reference to: 

<http://www.loc.gov/cds/notices/2007-05-25-Subject_Authority_Validation_Records.pdf>

-----------

I worry that explicitly establishing more and more combinations is completely the wrong direction to go in. It seems to me that if we thought of the elements of pre-coordinated LCSH subject string as building blocks and pursued a solution more akin to the way OCLC Connexion's control headings function works (linking to authority records for each of the parts of a heading string) we would be better off in the long run.

Most of the LCSH heading strings that are not established editorially are constructed by attaching geographic and/or "free-floating" topical or form subdivisions to a base subject. Right now OCLC's control headings function does a fairly good job of knowing where a geographic subdivision should go in a given string because whether or not each element of a string can be geographically subdivided is coded in its authority record. Using the logic that the geographic subdivision should either go at the end of the string or immediately before the first element in the string that is not authorized for geographic subdivision, headings can be adjusted and updated automatically. The control headings function can also inform the cataloger that a heading cannot take a geographic subdivision at all.

Free-floating topical and form subdivisions are divided into those that are free-floating under any topical heading (any 650) and those that are only permitted under certain types of base headings (e.g., classes of persons, diseases) and there is nothing coded in the authority record to say which category or categories a given heading falls into. However, it seems to me that there is no theoretical reason why this information could not be coded into authority records and combined with a set of rules that would enable a computer to validate any possible combination of elements in an LCSH string. Although this approach might be more complicated, it seems to me potentially more useful and less of a monumental undertaking than trying to explicitly establish individual combinations. It might also be more accurate--our local system is set up to require authority records for individual combinations, but these authority records are sometimes created based on their use in our database without being sufficiently vetted, leading to the establishment of what should be mutually exclusive heading strings (e.g., at one time we had local authority records for both "Youth--Alcohol use--United States" and "Youth--United States--Alcohol use")

Of course, adding information about the categories into which headings fall would also be a huge undertaking, but I wonder if much of it could not be done automatically based on an analysis of the patterns of use of free-floating headings in a large database like WorldCat combined with inferences from broader and narrower terms. This becomes a little complicated because "Cancer" is a disease, but "Cancer $x Patients" is a class of persons, but is probably not impossible in most cases.

I also think that it is possible to retain the benefits of pre-coordinated subject headings (precision, context for browsing) while overcoming some of the limitations of current implementation (cryptic syntax, inflexible citation order).

I do think we could do something to make the display of LCSH less cryptic. For example "$x History" as a subdivision generally means "history of" something. So what currently shows as "United States--History" could be displayed to users as "History of the United States" or "United States, History of" depending on context or need. Significant parts of the meaning of an LCSH string depend on citation order, but if these meanings were displayed more explicitly rather than just using "dash dash," I think they would be clearer to users and also could be presented in more flexible citation orders than is currently the case. Take, for example, "United States--Geography," which could be displayed as "Geography of the United States" and "Geography--United States," which could be displayed as "Geography (discipline) in the United States" (a distinction that would be totally lost in a naive conversion to a post-coordinate system using current LCSH). Many headings could be converted to clearer display forms using algorithms. In the cases that I can think of where this cannot be done in a rule-based fashion, it seems to mean that something is inadequate about our current syntax or encoding (i.e., the fact that we don't know whether "Detectives--Egypt" means "Detectives from Egypt/Egyptian detectives" or "Detectives in Egypt" is a weakness of our current system).

The inflexibility of citation order in subject strings is only partially overcome by so-called rotating browse lists that provide entry points starting with each element of a heading string and which sometimes have significant drawbacks. In our OPAC, the way the headings are rotated means that there is no way for a user to distinguish between or do an effective search for "History--Philosophy" (philosophy of history) and "Philosophy--History" (history of philosophy) despite the fact that these are two very different things. If we displayed more explicit information about the relationships between the elements of subject heading string, more citation orders would become possible without conflating things together incorrectly. 

So I think if we thought of elements of LCSH as building blocks related to each other in a rule-based manner rather than hard-coded strings, we might be able to build a more useful system. It might also be more effective in helping with the problem mentioned in an earlier email about World War II in France and cross-references. If someone did that search, a computer could look for each individual word as well as groups of words in both the authorized headings and cross-references in authority records and find the best matches (weighting things that occur in a phrase or occur together so "world war II" could point to "World War, 1939-1945" and "france" could pick up subdivisions with variations on "France" and "French" (e.g., "Aerial operations, French")

Sometimes I think defenders of so-called traditional cataloging focus too much on the "how" (which IMHO often IS tied to practices that were optimized for the card catalog environment) and not enough on the what and why (the functionality we're trying to provide and the purpose of that functionality). It seems to me that the most promising future for cataloging is one that includes interfaces developed with sufficient input from catalogers that the interfaces use our metadata effectively because they understand what it's trying to do and in which catalogers produce data that is more effective than our current data at furthering our end objectives in an online environment.

-------------------------------------
Kelley McGrath
Cataloging & Metadata Services Librarian (Audiovisual)
Bracken Library
Ball State University
Muncie, IN 47306-0161
Phone: (765) 285-3350
kmcgrath_at_bsu.edu