Re: standards

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Fri, 23 Jun 2006 07:35:05 +1000 To: NGC4LIB_at_listserv.nd.edu

> Alexander Johannesen wrote:
> > We use the indexes to cluster information, so in that
> > sense, you can browse around in as many clusters as
> > you possibly can want

On 6/22/06, Bernhard Eversberg <ev_at_biblio.tu-bs.de> wrote:
> That's very nearly not what I had in mind. A cluster contains
> only what in some way matches a query.

Not at all; a cluster can be defined by a number of things, such as
sub-queries, various states (user state, db state, current path,
current subject), statistics of popularity ... only our imagination
can stop us.

> Whereas a real index shows everything that
> is actually there in an alphabetic environment, like the
> example I provided, with no restrictions on going up and
> down from there.

Real indeces? :) What are those? Back-of-book indeces? A-Z lists of
something? An index these days is a fairly vague notion.

>     An NG OPAC need not have browsable, visible indexes.

Again, that depends on what an index is.

> > Of course it does; statistical error pruning is exactly
> > what Google does to give suggestions.
> Only above certain thresholds. Rare words or names are not
> covered by this.

That's simply not true. If a rare word *exsists* at all, it can be
suggested, and there is no difference between a word and a name. Have
you tried these things? I'm a norwegian going through mostly
Austrlasian materials; I've tried it out, and it works quite well.
Sure, there's *always* that little pesky thing you can't do, but that
applies equally well too Google, reference librarians and my own
brain. We cannot expect perfection from our systems, because such does
not exsist, authority records or not.

> And we need not even do experiments ourselves. "Google
> Book Search" is presumably trying very hard to do exactly
> this. Let's look, once it comes out of "beta".

I'm always keen to see what Google is up to. My prime example of doing
so seems to be the way they solved really good translations. Not by
semantic parsing or stemming or dictionaries or anything like that,
but through pure statistical analysis of swats of EU translated texts;
find the french version of "I am but a wee boy" in these documents,
and you're mostly done. It's done through statistics, and it works
bucketloads better than most other systems, was helluva cheap, and
lightning fast to implement. And that's the key; approach a problem
thinking "what is the simplest thing that might work?" I love it.

> There *is* a body of experience with legacy systems behind
> us.

Experience with stuff that has changed may also render itself useless.
Do we know?

> But of course, what can be tested, should be tested,
> AGWS.

Everything can be tested, so we need to be a bit careful. :)

Alex
--
"Ultimately, all things are known because you want to believe you know."
                                                         - Frank Herbert
__ http://shelter.nu/ __________________________________________________