Re: Subject Access -- Costs

From: Corey Harper <corey.harper_at_nyob> Date: Thu, 24 May 2007 12:01:22 -0400 To: NGC4LIB_at_listserv.nd.edu

I'm struck by the notion of "automatic classification" - canard or not.
  I wonder if there's anything useful that might come from experiments
with automatically generating provisional LCSH based on keyword tags
added by users.  This could provide a starting point for LCSH
assignment, and the distributed assignment effort of catalogers might
then shift to maintenance, revision & assessment.  Are these activities
less "costly" than up-front assignment?

LibraryThing seems like the perfect playground for such an experiment.
Scanning tags against the LCSH vocabulary, first checking headings, then
use-for type x-refs, then broader, narrower and related terms.  Maybe
even looking at the co-occurrence of tags as an initial means of
matching to pre-coordinated headings in cases where authority records
for subdivided topics exist.  Depending on the level of info contained
in the LCSH data searched against, an algorithm could even match a
heading and a subdivision authority record, and check the coding to see
if the two are allowed to be used together.

What would a good machine-readable data format to support this kind of
work look like?  RDF, and specifically SKOS, comes to mind, but I'm
curious what other thoughts people might have.  Would it be possible to
develop such a system using our current MARC authorities as a data-set?

-Corey

Tim Spalding wrote:
> The secondary benefits of killing it--that it will break a mindset--or
> do you think the benefits outweigh the costs overall?
>
> I'm all in favor of letting a thousand flowers bloom here. Tags,
> keyword, distributed LCSHing*, even "automatic classification" (a
> canard). Heck, I've tried to get people into reviving the original
> Cutter Classification.
>
> But killing LCSH seems unlikely soon—and so a waste of our
> resources—and somewhat mean-spirited and hostile. I don't mean that
> anyone on this LIST is mean-spirited or hostile; quite the contrary!
> But the idea is. If LCSH dies, it should do so because everyone is
> overjoyed with actual, demonstrated and widely-understood value of its
> alternatives, not the potential value, and not because someone wants
> to save a little money.
>
> Anyway, as has been pointed out, you distribute the assigning long
> before you distribute the care and feeding of the system. Why not pick
> some books and start playing with distributed LCSHing? Can librarians
> around the world assign good LCSHs? Can regular people? Can we design
> a system that allows everyone to participate, but only the best stuff
> to stand?
>
> I have, quite coincidentally, a good start on a very easy
> LCSH-assigning GUI, based on the full LC Authorities file that Simon
> Spero scraped. I was going to release it as a sort of half-joke
> "Del.icio.us for LCSH." But it could be LCSH for library items
> instead, and be a key piece in a "Wikipedia for LCSH."
>
> Anyone want to do this, or are we just talking?
>
> *If LCSH wants to beat Google, it needs to become a verb ;)
>
> On 5/24/07, Karen Coyle <kcoyle_at_ix.netcom.com> wrote:
>> Shouldn't one also factor in benefits? Costs are not an absolute, but
>> relative to benefits.
>>
>> kc
>>
>> -----Original Message-----
>> >From: "Sperr, Edwin" <sperr_at_NELINET.NET>
>> >Sent: May 24, 2007 7:00 AM
>> >To: NGC4LIB_at_listserv.nd.edu
>> >Subject: [NGC4LIB] Subject Access -- Costs
>> >
>> >Tim poses a question that cuts to the heart of many of the debates over
>> >providing subject access though LCSH:
>> >"What *sort* of money are we talking about?".  To begin to answer that,
>> >we need to unpack the question of costs into at least two parts: 1) The
>> >cost of assigning LCSH headings to individual records, and 2) The cost
>> >of maintaining the entire LCSH ontology.
>> >
>> >The cost of adding and editing individual headings seems to be one of
>> >the main arguments advanced for dumping (Calhoun) or de-emphasizing
>> >(UC's BSTF) LCSH.  There is no doubt that it's relatively expensive for
>> >a person to analyze an item and then have to apply the proper heading.
>> >However, it is also expensive to do all the *other* things involved in
>> >acquiring and describing an item.  Folks start off with "There's a of
>> >waste and inefficiency in traditional cataloging" and move straight on
>> >into "Burn the Red Books!" without really touching on any of the steps
>> >in-between.  What *actual* part of the total cost is the
>> >subject-analysis-and-description-part?  Big decisions need to be driven
>> >by data instead of by anecdote (or the giddy thrill of saying something
>> >Really Subversive).
>> >
>> >It's certainly possible that a good portion of these costs could be
>> >lessened with better tools.  Indeed, Dorothea Salo recently
>> >(http://cavlec.yarinareth.net/archives/2007/04/24/irgrunt/) described
>> >looking in her catalog and finding that no two copies of Eugene Onegin
>> >had the same headings:
>> >
>> >        "This isn't a cataloguer problem. It's a tools-and-processes
>> >problem. Cataloguing
>> >        tools should have heuristics for recognizing a new translation
>> >of Eugene Onegin and
>> >        pulling up the other records for such translations. Subject
>> >assignment at that point should
>> >        be point-and-click, accept what's already in the catalogue
>> >(with, of course, an option to add
>> >        new subjects if truly necessary-which I can't imagine it is
>> >terribly often!)."
>> >
>> >Again, this stuff isn't rocket science -- just missed opportunities.  It
>> >is to be hoped that as more librarians start building their own tools
>> >(just as with catalog discovery layers) that we'll see some progress on
>> >this front.
>> >
>> >
>> >As for the second question, I really have no idea.   We're currently in
>> >a situation (as much by historical accident as anything else) where LC
>> >is not merely the library for a very special constituency, but the
>> >de-facto national library as well. I imagine that riding herd over LCSH
>> >*is* expensive, and that to least some folks over there it looks like
>> >something of an unfunded mandate.  Maybe we *do* need to start planning
>> >for a future where they just don't want to (or feel they can afford to)
>> >do it anymore.
>> >
>> >The problem for an LCSH sans LC is that controlled vocabularies have to
>> >be *controlled* somehow.  Theoretical rigor with regards to
>> >broader/narrower/related headings would be nice as well, but at
>> >*minimum* there needs to be a set of agreed upon terms.  Otherwise the
>> >whole notion of collocation falls apart.
>> >
>> >Do we invest The Power in a new authority (OCLC?, LibraryThing?) or do
>> >we attempt to decentralize?  What would a radically decentralized
>> >Controlled Vocabulary look like in theoretical terms? (no, tag-clouds
>> >don't count)  Are there any current examples we could look at for ideas?
>> >
>> >
>> >
>> >Ed Sperr
>> >Digital Services Consultant
>> >NELINET, Inc.
>> >153 Cordaville Rd. Suite 200  Southborough, MA
>> >(508) 597-1931  |  (800) 635-4638 x1931
>>
>>
>> Karen Coyle - on the Road
>> kcoyle_at_kcoyle.net
>> skype: kcoylenet
>>

--
Corey A Harper
Metadata Services Librarian
Bobst Library, B42-LL1
New York University
70 Washington Square South
New York, NY  10012
212.998.2479
corey.harper_at_nyu.edu