Re: Discussion of id.loc.gov

From: McGrath, Kelley C. <kmcgrath_at_nyob> Date: Tue, 19 May 2009 13:42:10 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

With LCSH, as far as I understand it, it used to be that only two types of heading strings (with more than one subfield) were explicitly established (and thus had their own identifier):

1. Where a heading could not be constructed based on instructions for free-floating subdivisions, patterns (as are common in music), or the ability to subdivide geographically and therefore had to be established editorially.

2. Where a cross-reference was wanted for a string that otherwise could have been legitimately constructed according to the rules. An example would be "Diabetes $x Patients." Patients is free-floating under disease names, but this heading was established so that an explicit cross-reference could be made from "Diabetics."

This changed fairly recently when LC, in response to popular demand, decided to start separately establishing strings that could be derived from the rules and that don't need x-refs. This is apparently due to the fact that many ILSs require a corresponding authority record for an entire subject heading string in order to validate it. Any subject authority record that has a 667 field that says "Record generated for validation purposes" is one of these. 

To my mind, this is a misguided and short-sighted approach and we would be better off trying to come up with a way to make the rules for combining subject terms more machine-friendly so that they can be manipulated and validated more usefully and systematically. The ILSs should be helping us not we should be contorting ourselves to help the ILSs (although to be fair, someone probably told them it should work that way).

Even leaving out the LCSH equivalents of "colorless green ideas sleep furiously," the number of valid LCSH combinations is so high that it is not practical to try to establish them all editorially, much less to try to keep them updated. In fact, the occasional incorrect combination has been known to sneak into LC's project and it has certainly happened in our local catalog.

I think what OCLC has been trying to do is a better approach, where each of the separate parts of a string is controlled separately. So the string

   "Aircraft accidents $z Alaska $x History $v Fiction."

Can be linked to four authority records:

   sh 85001323 $z n  79018447 $x sh 99005024 $v sh 99001562"

Except for certain kinds of pattern headings (mostly music headings and forms similar to "$x Religious aspects $x Buddhism, [Christianity, etc.]"), an authority record should exist for each part of a subject string.

OCLC can only go so far for this because a lot of the info we would need to do complete validation is not encoded in our authority records. Their validation routine is pretty good at the order of subdivisions, but it knows nothing about types of headings (e.g., diseases, classes of persons) so it merrily validates nonsensical combinations such as "English literature $x Patients." The subdivision records generally tell you (in both coded and plain English forms) what types of headings they can be used with, but the main headings don't tell you what they are.

However, if we expended the effort on going into these "records for validation purposes" into categorizing the existing headings so we know which ones are types of literature and which are names of disease and into coming up with lists and rules for the parts of pattern headings, we would have a more rigorous validation system that would also enable better display of headings (all the diseases or religious art together and other sorts of interesting manipulations).

The only reason that there are separately established strings and identifiers in some cases and not in others is that those are the ones LC has gotten around to. Although I think they were planning to start with the more common ones, it's nevertheless a fairly arbitrary distinction. So it seems to me that it would be better to have a more hospitable system that supported the combination of parts rather than trying to represent some of the strings as wholes, but not others.

It is true that the specifity of LCSH, although a strength from many perspectives, also hinders access in some cases. Much of that could be ameliorated with better navigation of the existing strings (and there have been proposals for this for many years, but nothing ever seems to happen) and a more rigorous syndetic structure. The existing order of facets doesn't serve all users, but there's no reason more flexible and useful displays can't be developed even from the existing facets.

I don't think we want to lose the relationships between terms by completely deconstructing the strings, but I think we do need to get them into a form where they are more amenable to manipulation.

Kelley McGrath
kmcgrath_at_bsu.edu