Re: Relevance ranking: was Aqua Brow

From: Alexander Johannesen <alexander.johannesen_at_nyob>
Date: Sun, 6 Jan 2008 08:33:26 +1100
To: NGC4LIB_at_listserv.nd.edu
On Jan 5, 2008 11:47 PM, Weinheimer Jim <j.weinheimer_at_aur.edu> wrote:
> [...] but there are a few things that must be understood and accepted.

I think one of them must be that you two are talking slightly past
each other. First, I don't think anyone here actually disagrees with
your observations of the past, that the concept indeed has always been
around. I assert it still is. But that concept searching you're
talking about is fully human, it is you and me using our brains, and
few can argue with that. The instructions in old library systems were
basically "use your brain."

Then libraries shifted to computer based catalogs, and indeed the
concept that we use our brains for needs to be translated into
keywords, fields, what have you. And again I don't think anyone
disagrees with Casey that this is damn hard. Heck, who would argue
that computers are smart enough to do what we used our brains for in
the past?

Oh dang, that's me.

Folks, computer software will not be getting dumber. Just like
anything else in the evolution of the human species, so the software
we write evolve and become better. Jim might be sad that we lose the
intellectual humanness of our searching, and Casey might be sad that
the foundation upon we are to write that software is crippled, and as
much as I share and agree with both of those sadnesses, I'm even
sadder that the library world doesn't seem to understand that a) in
order to survive this imminent future and *especially* b) to best
serve our patrons and human kind (!!) we must put our efforts into a)
cleaning up the foundation (meta data), and b) transfer our  knowledge
into a form that can be used by and for computers. No, meta data
simply isn't good enough. We need structures for our intellectual
efforts, better models, ontologies and processes that are not only up
to date but prepared for what happens in 50 years. We're currently in
the year 1875 (or thereabouts), and as much as nostalgia is nice and
all, it is a sad state of affairs.

...

> With fuzzy searching, for the sake of argument, I'll go ahead and
> grant that perhaps Google might find "Mark Twain" from a
> search for "Mark Tvein," but I absolutely refuse to believe it will
> find "Quintus Curtius Snodgrass."

Yesterday I watched "Being Jane" (slightly fictional flick about Jane
Austen) and wanted to quickly try to find out who Lefoy married in
Ireland, and put in "Jane Austin" by mistake. Google showed results
for Jane Austen without even a blink or a "did you mean?" What will
they think of next?

And you know, *of course* Google will find "Quintus Curtius
Snodgrass", however it will find it in the same manner you claim the
past did it so well ; you search for "Mark vein", Google tells you
"Did you mean Mark Twain?", you find more stuff, go to WikiPedia, see
that he has pseudonyms and search Google for those as well.

This is not hard, and very soon you'll find that software will do this
line of lookup for you (look into all the various semantic Wiki's
that's popping up these days; inferencing pseudonyms is extremely
easy). I can probably write a script in a couple of days that will do
this, and I'm quite confident about this because, indeed, I already
have; a couple of months back (doomed to forever sit in my archive, I
suspect) I wrote something called WikiedSearch which is a web service
that for every search you do to a catalog, I will also look it up in
the WikiPedia corpus (11 Gb of XML) which I've pre-analyzed in some
ways (is-a relationships, see-also, pseudonyms, further reading, that
sort of stuff) and *indeed* use Google to do further searching as well
as further searching within the catalog (building a search list, and
then deploy that to whatever next search service you'll like). This
stuff ain't that hard to do anymore.

...

> Now, I could continue to look at the records for "dogs" at this point,
> or actually think, "Oh! I really want fighting dogs." and go there
> immediately.

How is this different from searching Google for "dog" and thing "I
really want fighting dogs", and search for that instead? I'm not
following your straw man arguments here.

> http://catalog.princeton.edu/cgi-bin/Pwebrecon.cgi?Search_Arg=dogs&Search_Code=SUBJ_&CNT=50&HIST=1
>
> As you look at this, you will find headings that most probably would
> never occur to you, such as "Dogs as laboratory animals," and other
> concepts such as "Dogs--Japan," "Dogs--War use of." Something
> might even interest you enough to look at one of them more closely,
> but what is more important is that in just a few minutes, you have
> looked at *everything* about dogs (i.e. a concept search) in the
> collection of one of the great libraries in the world.

No I haven't, I've only been presented with a list of subjects that
catalogers think items in their collection might have relevance to.
I'm sure there's thousands of dogs and topics about them that are in
the collection but not in the meta data, and I suspect a lot of what's
listed are mistakes, as well. Of course, innocent patrons don't know
this.

Also, I'm not sure I'd use a list of subject headings as a guide to
"what's out there" on any subject, but of course, that's because I
have some knowledge of subject headings. :)

> ... It was painless,

Dogs--Anatomy & histology--Atlases.

Quick, what do you think that subject might be? It's only a painless
experience if you know what these subjects might mean, but I'm afraid
that most people wouldn't know much about LCSH.

> it was quick,

I was looking for dog breeds, especially the Shetland Sheepdog, but
the listing says that there was none (0) for "dog breeds." It was
after fiddling around for a while I gathered that "more info" might
bring something else than 0 results, and gave me a list of generic
"dog breeds", but nothing about actual dog breeds, poked around some,
and didn't really find anything related. What's your definition of
"quick", by the way?

> ... and it was complete.

Dogs--Bibliography.
Dogs--Bibliography.
...
Dogs--Drama.
Dogs Drama.

It was more than complete; it was misleading, with duplications, and I
bet my boots lots were missed and some were mis-cataloged.

Please don't make bombastic statements about our glorious meta data
and catalogs like that; it's just too easy to thrown bricks through
that window, and you're looking like the fool. Our catalogs are
littered with rubbish, and even though there's gold in them mountains
it's not as much gold as there is rock, and you've got to dig for
months to find the specks.

> Try searching Google for dogs and see what you get: rude
> references to women and probably politicians, porno and
> who knows what else, and you'll spend hours going through
> them all, but it will all be arranged by this magic of "relevance."

Oh, for @^$%@@%$ sake, what a stupid, ignorant, wishful and sad
statement. I suggest *you* search for "dogs" in Google and see what
comes up ;

I got a full page of *only* relevant links to where I can get dogs,
info on them, news items with them, pictures of one, and I got
"Searches related to: dogs : pictures of dogs, dogs types, information
about dogs, dogs health, facts about dogs, dog games, dog names, and
adopt a dog", so please take your catalog and stuff it up your too
large ego.

This is why Casey was depressed; you've got your head so far up the
library backside you can't see reality for what it is, right there in
front of you. I'm getting a bit tired of this kind of silly discourse,
so I'll just leave here before I too leave in disgust and depressed.
And people wonder why I left the library world ... *grumble*


Alex
--
---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------
Received on Sat Jan 05 2008 - 16:34:01 EST