Re: Hot (MARC) metadata!

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Fri, 10 Aug 2007 10:35:52 +1000 To: NGC4LIB_at_listserv.nd.edu

Hi,

On 8/9/07, Ted P Gemberling <tgemberl_at_uab.edu> wrote:
> I don't think Nathan and I are arguing that people like you should be
> kicked out of libraries.

I'm glad you think so. But I'm also mystified to why you would think
that's how you come across, or that that's what I fear. :) I certainly
hope I'm not coming across as being self-preservative in this ; my job
title and job has got nothing to do with AI. This is strictly me
tinkering in my spare time, and I suspect it will always stay that way
; as soon as I mention ontologies, AI or anything else that isn't what
we're currently doing their eyes glaze over.

No, I'm only talking about this because I'm passionately in love with
the library world and I'm trying to save it from its own destruction!
I might be wrong, of course, as so many in the library world are happy
to sit back and relax and not get their knickers in a knot over stuff
that will sort itself out, one way or another. Maybe they're right,
maybe caution and reluctance is a good strategy as the world whirls
past us (not that I'm painting a totally objective painting in this
narrative, but hey ... :), but I'm one of those people who can't sit
still if there's clouds on the horizon and our washing is on the line.

> We're basically, I think, just asking for caution. As I said in another
> post, people in 1969, at the time of the first moon landing, wouldn't
> have guessed that it would take 50 years to put people on the moon
> again. But that's apparently the case. Sometimes technology progresses
> faster than we think, and sometimes more slowly. Sometimes it
> "plateaus."

Actually (and I think someone else mentioned this) the technology
hasn't slowed ; politics and cost has. After 1969 and hitting the moon
there was no more *need* to go to the moon, as the political goals
were all achieved. This is why the ISS is such a farce, and it's why
going to mars ain't happening anytime soon ; no political need.
However, the US presidente spent three moon-landing budgets going to
Iraq to, er, not save the oil. :)

> You mentioned a Google translation project that used a "human corpus." I
> don't know the details of that. But Google's translation feature on the
> search screen doesn't show much evidence of the power of artificial
> intelligence.

Ah, sorry for the confusion ; I didn't mean to imply that their
translation service uses IA. In fact, I did specify it used
statistics. It was just an example of a paradigm shift in thinking
about solving an old problem. As such, I think we, too, need to think
anew the whole problem space of "cataloging", we need to change a few
paradigms.

> One kind of metadata some people consider "hot" is ranking of hits by
> popularity. That's a relatively easy thing for a computer to do. All it
> has to do is count the number of times something has been checked out,
> or maybe how many times the cataloging record has been opened. But
> that's pretty useless to anybody but the most beginning student of a
> subject. It probably correlates quite well with the books you can't get
> for awhile, because they've been checked out by somebody.

Absolutely agree, although I did start a prototype of a "heat engine"
that uses a number of things (searches, checkouts, measure book's
contextual proximity to other books that person or others have checked
out, deflation over time, etc) to measure a more realistic "hotness"
of a thing, including various contexts such as within "all books",
"all dehingerated plosmosis diagnostics books", "all searches today",
"book topics this month weighed against 5 popular news sites this
year", "pictures of people", "recordings by / about people in the
books searched this week", etc, etc. So many possibilities.

In fact this topic in itself is what agitates me a bit ; why aren't
we, the librarians and contectual experts, defining these scenarios
and write software to match them? LibraryThing has a few things going
there, but there's so many more we could do, and possibly should do.
It's about catching both trend, habits, quality and expertise in these
little models. Why aren't we working *really*hard* on this?

Here's an example ; As a student doing "Bollocks 101" I'm sure there's
plenty of contexts we can think of that would ease his pain, from
established checkout patterns by scholars within that field to what
other students too are doing (checking out, searching, browsing), to
news items covering his topic, serials that match the subject,
biographies on important people with the field, etc. There's many
exciting things here for us to do, and I bet this is something very
few commercial companies will have the time and expertise to handle.
it should be the perfect library task!

> When I was in grad school in the 90's, the most important book for my
> research was a collection of Latin texts that was published about 1900
> and not checked out since at least 1950. So it would've ranked very low
> on any "popularity" scale. The popularity of a title has little relation
> to its relevance to a topic.

Ah, but with a bit of social engineering sprinkle that book would have
great rating, tags to go with your research, comments by your
professor, etc, etc. And *then* AI would jump all over it and save the
world from itself. :)

> At ALA, Sally McCollum of LC talked about how marcxml is
> somewhat of an advance over marc.

I think that's so "somewhat" that it's almost useless. :) No one
treats MARC and MARC XML any differently as far as I know.

> It is interesting that Weinberger seems to think of RDF and the Semantic
> Web as another form of the "second order." Actually, I don't think he
> exactly says that, but he says they have many of the same problems as
> DDC or other traditional "second order" things. They're not "messy"
> enough, though he does think they have some value.

The SemWeb has an identity problem (as in Persistant Identification
and all ilk associated with it) which I think is larger than the low
messiness of it (I personally think it's messy enough, but you need
far too complex trust models to make it a snap). And for anyone
following the dreaded FRBR debate, identity troubles can be
devastating to its progress and usability.

Smiles,

Alex
--
 ---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------