I'm confused about what you're trying to do, what the goal is. What is
the use case? What is the user doing, and what do you want the system
to do in response?
What I can figure out from your post is maybe:
1) User chooses to look at all the books belonging to a particular LCSH
subject.
2) But you want to put some of these books higher in the list than
others, as being "more" relevant to that subject.
Is that what you're trying to do? There are a whole bunch of things
that could be done involving LCSH and relevancy ranking, this is just
one, and to me not neccesarily the most interesting one.
Most interesting to me would be actually ranking the LCSH headings
themselves:
1) User does some search, entering some words in a box.
2) Identify which LCSH subjects might be relevant to that search. Don't
just generate a set, generate a _ranked_ set, from "most likely to be
relevant to your search" to "least likely".
To me, that's a lot more productive and interesting a thing to look at
involving 'relevancy ranking LCSH'. I think there has actually been
some writing on how you might approach that in the Library Science
litereature, maybe even some research projcects here and there, although
as we all know, not nearly as much implementation and actual
experimentation as we'd want.
(Talking about features of LCSH not present in typical 'folksonomies',
the most significant are relationships between LCSH headings, as well as
the lead-in terms (aka, "Used for", aka 'non-prefererred terms')--there
are some interesting ways that you could use some of these to do some
interesting things.).
If we are talking about ranking books _within_ a certain LCSH subject,
though, I'm not sure what our goal would be. Do we want a book to show
up higher if it's somehow "more" about that subject than other books?
What does that even mean? More of the book concerns this topic? This
topic is more central to the book? Hmm. In your 'folksonomy' example we
know exactly what it means---a whole bunch of people thought that
"dytopia <http://www.librarything.com/tag/dystopia>" was an appropriate
tag for the book 1984. This is very useful information in a folksonomy
environment, because we don't know how 'trustworthy' the tags are, this
is one way of deciding it's a trustworthy tag. But with LCSH, the
assumption is that all the assigned headings are trustworthy to begin
with. Is part of the 'problem statement' here that this may not in fact
be true? I'm not sure what we're trying to do. If we could magically
get the system to do whatever we wanted---what is it we want exactly?
To me, narrowing down the set by 'profiling' in some way seems more
promising than relevancy ranking of books within a heading set. By
'profiling', I mean showing the user some sub-divisions (with total
number of 'hits' within each one shown) that the user can choose from to
"drill down". These could be other LCSH terms (sub-terms and/or other
terms that happen to co-exist in the search set, etc.), 'folksonomy'
tags, headings from other controlled vocabularies, statistically derived
clusters, or anything else. Gary *Marchioni*'s "relationship browser" is
one interface for letting the user's profile/explore a collection or
result set on multiple dimensions at once.
Jonathan
Tim Spalding wrote:
> I just wrote up a blog post about trying to tease relevancy ranking
> from LCSHs:
>
> http://www.librarything.com/thingology/2007/02/can-subjects-be-relevancy-ranked.php
>
>
> I wonder if anyone has made, seen or can think of any good methods to
> do it. So far I've only seen non-ranked and popularity-ranked results.
> In the blog post I talk about playing around with how LCSHs reinforce"
> each other statistically, but I couldn't get the algorithm to produce
> good results more than sporadically.
>
> I'm not sure if this is a cataloging or a coding. Maybe that's the point.
>
> Tim
>
--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
Received on Mon Feb 05 2007 - 15:05:18 EST