Re: Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

From: Patrick Etienne <patrick.etienne_at_nyob> Date: Tue, 29 Jun 2010 20:16:02 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

NGC4LIB Community -

I'm another lurker on the list and will also take up Eric's challenge.

I'm a technologist, not a librarian, but I do spend a significant amount of
time thinking about information architecture and user interfaces. Lately, a
lot of my time has been spent evaluating a wide variety of discovery tools.
While I can't comment specifically on traditional catalogs, I feel I can say
something meaningful about the search processes various library audiences
(undergrads, grads, profs, faculty, library staff, john q. public, etc) go
through in their quest for knowledge or information.

It is meaningful to think about searching within two different contexts:
*) Searches where there is a specific document (article, book, media file)
in question.
*) Searches where there is no specific document in question, rather an
exploration of available information.

Does it makes sense to use a search axis such as author, title or subject?
My question to this would be, by what metric do we judge the effectiveness
of such an axis? I can think of no better than, does searching on this axis
both significantly and dependably narrow the field to relevant results?
(This may be a bit basic, but from previous conversation it seems, if not
necessary, at least appropriate).

For the first search context, author and title do well. There are a limited
number of authors by any one given name, and each would have a manageable
amount of works. In short, there's very little *ambiguity* within a search
involving an author's name or the name of one of his works. Each of these
has an exact "right answer" (leniency given for things such as international
spellings of an authors name, use of a middle name, or using or not a
title's subtitle, etc). To this, we could add another axis that has an exact
right answer. For example, who the publisher of the document was, or when
the document was published (again, the idea of multiple editions of a
document may blur this line, but even that could be accounted for).

For the second search context (that of exploration), the author context is
much less likely to be beneficial (unless of course the exploration in
question is of a given person, the author). The title search axis would
provide very little benefit, if any at all. Here as well, other axes such as
publisher, date published, or edition relate to specifics and would not be
helpful in an exploration type query.

So far, I've written nothing that I would imagine anyone would disagree or
have any problems with. In essence, up to this point I'm just making sure
we're all on the same page. There are however, significant pieces I've left
out from the above evaluation (this is where the discussion gets
interesting).

It's certainly the case that sometimes people will have a specific document
in mind but do not know specifics such as the author, title, or publisher
(or ISBN). Not knowing these kinds of identifiers which significantly narrow
down the scope of inquiry makes for a broad search indeed. With these
conditions, I'd have to say that there's no significant difference between a
specific document search and an exploratory search. *I believe the real key
is how these kinds of exploratory searches are performed.* There is a lot of
focus on "Subject" as a search axis in the library world. My understanding
is that there is a standard by which "subject headings" are assigned to
specific documents and that this is where results based off subject axis
queries originate. I also believe that documents are cataloged under a
particular subject heading, but not more than one (I could be wrong about
this. Part of the reason for my post is to better understand the library
world). The trouble that arises is (from my understanding) that subject is
by no means an unambiguous axis on which to search, and that there is
significant effort put forth in the library world toward disambiguating
subjects within a one-to-one hierarchical methodology. (Now what in the
world does he mean by that?). I'm glad you asked! Again, I'm not a librarian
by trade, so I may have some details mixed up. Please do offer guidance or
correction for where I may have veered off.

I think it comes down to subject headings. If, for each document there is
one-and-only-one subject heading, and if subject headings are arranged in
hierarchies, then my understanding is on par. My argument would be that the
more electronic content/media proliferates, the less viable this
1:1/hierarchical model becomes in representing both digital materials and
perhaps more importantly, the perception with which library audiences
understand classification of these materials. Why is this so? First, the
more electronic content/media we have, the more exposed the general public
will be to the classifications or metadata we use to "catalog" this content
(because the demand for autonomous rather than assisted search rises).
Second, the more ambiguous the search axis (subject vs. author), the more
the individual's perception of the world (or content) colors or shapes
his/her search terms. When information query gets to the point of something
so ambiguous that two very intelligent people will both expect to find a
particular document based off of two single but very different search terms,
the one-and-only-one relational model of subject to document breaks down.
Following this, I don't think it's difficult to understand that hierarchies
of subject headings would also break down. If two people couldn't agree on
one term, how much less a hierarchy of terms.

For those that are still with me, I appreciate it. I've not tried to be
exhaustive, but indeed thorough.

Here's what you've likely been waiting for. My contention is that
"cataloging" of the future, must needs be one-to-many, and non-hierarchical.
We need something that can account for the (perhaps even psychological) ways
in which people view both the world and content differently. That is to say,
we need tagging systems.

*Gasp...
[silence]
Tagging!? Heretic! Burn him!

Rather than specific subjects headings, I believe we need authoritative
groups that can provide the knowledge and experience necessary to assemble
intelligent tags for particular resources (similar to LCSH, MeSH, or CSH,
but smaller, more numerous, and more flexible). At the same time, I believe
we also need non-authoritative groups that can provide intelligent tags for
those resources. These authoritative tagging bodies could be anything from
joey blogg's review of Frank Herbert's Dune from Amazon (a body of low
authority), to a body of Harvard Law professors (a body of high authority)
tagging resources which relate to their field. Individuals could then
perform searches based on tags-collections formed by some chosen level of
authority. Again, I'm not saying this would be easy to do. I am saying that
I believe it's the way that (at the moment) it looks as though we have to
move. The real questions are: 1) can we agree on a standard for representing
tags associated with a document, 2) can we agree on a method of interchange
for tag collections, 3) can application programmers provide ways to include,
exclude (or possibly even prioritize) different tag collections.

And just to make sure it's clear, I'm not saying that traditional cataloging
is dead. If it was, we'd have a vacuum, for there's nothing currently (that
I'm aware of) to take its place. I am saying that the system I've
illustrated above is a way we are likely to be moving in the future, away
from the monolithic, inflexible bodies governing traditional
1:1/hierarchical methodologies to the distributed, flexible bodies governing
smaller 1:M/non-hierarchical tag-based systems.

---

Hopefully this post will get some gears turning and inspire some thought.

 - Patrick E.