Re: Resignation

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Fri, 7 Sep 2007 05:57:03 +1000 To: NGC4LIB_at_listserv.nd.edu

On 9/7/07, Sperr, Edwin <sperr_at_nelinet.net> wrote:
> Getting back to a discussion a little ways up-stream, Alexander and I
> recently had the exchange...

Holy smokes, Batman! That ain't recently ; In Internet time, that's
like last year. /disclaimer

Anyways ;

> (alexander, earlier)
> ** I've argued before that seriously smart systems can easily subtract
> ** most normal metadata from free-text versions of any book or paper,
> ** including TOCs, subjects, contextual domains and quotations...I know
> ** several people here have said they don't think AI or smart
> ** software is up to the task, and that they represent no real risk to
> ** serious cataloging. Well, I can only say, please trust me in this! I
> ** worked in professional AI for over 7 years ; this stuff not only can
> ** be done, but has been done for some time.
>
> No it hasn't "been done for some time" -- not yet, and probably won't be
> that soon, either.

Did you read any of the correspondence about this subject which have
passed since that statement? There's been a lot of talk about it, and
it certainly hasn't been one sided with only me saying 'yay'.

> I don't mean to pick on you in particular; it's just
> that there seems to be a notion in some quarters (including places where
> folks are making funding decisions) that human-derived metadata is
> *already* obsolete.

Not obsolete per se, but the "MAchine" part of MARC have been
forgotten over the years, and the meta data available to us now is
extremely difficult to use out of the box in smart software systems.
And, this is my point, since libraries aren't going to use computers
less, we should perhaps make sure that computers can work with our
meta data just "as well" as librarians can. Anyone who indexes MARC
data and try to use that for clever searching knows this ; the meta
data is ok, but not good enough, and everything needs to be FRBRised
as a *minimum*, then filtered and prodded, then mangled, and then a
bit a buesswork on top. (And I share Karen Coyles utter frustration
that FRBR isn't even proven to be the right thing to do yet, after,
what, 15 years or so? That's quite amazing and alarming at the same
time)

> Look, I really don't care about the "poignancy of
> the topic" -- my wish that we not discard traditional cataloging too
> quickly is because I want things to continue to work at *least* as well
> they do now.

And if you still think I'm discarding any library legacy, then you
haven't read what I write very well ; I argue quite often here about
extending the legacy, to build on it, that in fact the library legacy
is an important piece to enable success where others might lose.
There's many ways and methods of using that legacy, though.

> There's certainly disagreement on the pace of advance in AI,

So far the negatives to AI's merits are from people who have little to
no background or deep knowledge of it (as far as I've seen), which I
guess one can understand, but doesn't ease my fristrations.

> what
> constitutes "real" AI and the relative merits of Librarians and Computer
> Scientists.

There's fake AI? And "relative merits"? What do you mean by that?

> If I can suggest one thing, it would be that instead of
> worrying about all this or about what might happen in the future, we
> concentrate on the here and now.

You mean, maintaining the status quo?

> I think we all can agree that
> statistical analysis and other computational techniques can be really
> useful adjuncts to traditional cataloging, and where the full-text
> exists to be parsed, they should be experimented with.  Indeed, that's
> already happening at the World Bank and NLM.  Any stories from other
> places?

Yes, I've posted a follow-up to this mail with at least one such project.

...

> However, keep in mind the fact that there are tens of millions of
> existing records that might never get juiced with whatever new
> technologies and techniques we come up with (fat chance we're getting
> the full-text to play with for any works that are still in copyright...)

Of course they will, unless you mean something wierd with "juiced". To
me it means that we put it all in the blender (same system), and
there's lots of added goodness that can be done with the existing meta
data as well as cross-pollination between items of full-text, just TOC
and none to help out and flesh out the records of the 'none' category.

> If you are to build a catalog that encompasses both old and new, then
> *all* records need to have some points of commonality if you're going to
> search the pile successfully.  Think of it like a pidgin language used
> for trade on a far frontier.

Not sure I get your point here. We do this already in any large system
around, mixing up identities and meta data, making some persisted
sense of it all. It's what computer systems do, really.

> > I also asked because why subject headings? Why are subject headings
> > the goal?  Surely there's better things to model if you've got the tools
> > to do it. Why aim low? Is it the law of conservatism? That pesky reality?
>
> Because controlled-vocabulary ontologies are still the best system
> available for modeling the "aboutness" of an item.

Got any evidence of that? Because auto-generated ontologies are quite
hot these days ... :) Frankly, it's all quite subjective, and in this
case certainly up to the scrutinization of librarians. Some times
controlled vocabs are best, some times they're rubbish. It's all
contextual.

>  Note that
> "aboutness" is different from just saying that document X has a
> similarity score of 678 to document Y.

Actually, "aboutness" as one parameter doesn't have to be more than
that. What you mean is that "aboutness" is multiple params. Yes, sure.
What's your point? Pointing out that the opposite then must be "X has
a similarity score of 678 to document Y" is just building your very
own straw man you can easily burn.

Regards,

Alex
--
 ---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------