Re: Relevancy-ranking LCSH?

From: Art Rhyno <arhyno_at_nyob> Date: Mon, 5 Feb 2007 23:50:30 -0500 To: NGC4LIB_at_listserv.nd.edu

This is not particularly helpful in this case, but...

One idea that can be found in Query By Example (QBE) engines used for
image retrieval is to create a composite representation of an image, and
then use it as sort of a fingerprint for similar content. If you had the
full text of the objects gathered under a particular LCSH, and used
something like Latent Semantic Indexing (LSI) or other techniques that try
to identify relationships between underlying terms and documents, it might
be possible to use a QBE approach where the content that best matches the
most common composite is a good indicator of the most representative
sample in a collection. Of course, for fiction, the most representative
sample might actually be the worst read of the lot since it would likely
be the most formulaic. But maybe that would make it the most relevant?

QBE would be really interesting for content like book cover images, and
Lucene has the term vector handling to at least experiment with LSI
without the need for super-powered hardware. Not that the full text of all
of the objects in most of our collections are readily available, and not
to detract in any way from the need for rich metadata, but I think an
important aspect of tomorrow's NGCs will be how they leverage full text
and other associated content. That's where the break with the card
catalogue seems to be the most profound.

art