Re: Resignation

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Thu, 30 Aug 2007 19:43:13 +1000 To: NGC4LIB_at_listserv.nd.edu

On 8/30/07, Weinheimer Jim <j.weinheimer_at_aur.edu> wrote:
> I think it's a fabulous and very interesting time right now.
> Do I think any of the ideas I have will be tried anywhere?
> Probably not, but it's still a very unique time to exercise
> the imagination.

About cataloging : There's something I feel I need to say on this
topic, because I feel it's being undermined in this whole debacle ;
the future of human cataloging will be minuscule. I've argued before
that seriously smart systems can easily subtract most normal metadata
from free-text versions of any book or paper, including TOCs,
subjects, contextual domains and quotations. Any serious publisher
have electronic versions of the stuff they publish, and running these
through cataloging software is a doodle. It will be the norm of the
future.

I know several people here have said they don't think AI or smart
software is up to the task, and that they represent no real risk to
serious cataloging. Well, I can only say, please trust me in this! I
worked in professional AI for over 7 years ; this stuff not only can
be done, but has been done for some time. Most AI problems have
historically been with access to various corpi to make them work, but
those things are now changing dramatically. Now that all books are
written on computers and available for analysis, even more so.

The reason this isn't in widespread use quite yet is because there
hasn't been much money in it, so it's been mostly an academic venture
with lots of interesting but underfunded projects that sits in a
portfolio but never gets out of campus. But Google and Amazon have
changed that. If you look closely to what sort of people Google has
hired for the last few years you'll see a lot of AI people in there,
and for good reason too ; he who controls the cataloging, controls the
flow of meta data. And from that you can make lots of money. And when
corporations make better, faster, more meta data than us, who cares
about what we do? Who cares about Slow Meta Data at that point?

AI systems will give a contextual subject heading and quote breakdown
per chapter or paragraph, if you like, with links to domain models,
other items of similarity | publisher | topic | author | field, etc,
measured by time, acceptance, quotations and more. And we, we still
argue whether to put the TOC in the MARC field or not. Trust me, the
meta data of the future will *not* be in MARC, because it simply can't
fit it in nor is structured for it.

So, the quality of meta data really is what it comes down to, and
right now, because we're tiresome librarians, we got supposedly good
meta data. But anyone who sits down with a fully indexed set of
subject headings and play seriously with it find flaws in it on a
search by search basis; its a very rigid and sometimes random piece of
work. Human cataloging is, well, human, and make a lot of mistakes
which will be especially troublesome when computers try to use them as
is. If we're to reap the benefit of the current fuzzy meta data, we
need to find more fuzzy means of using them, changing them to more
computer-friendly formats, because the current rigid indexing systems
fails the litmus test.

I still love you, though. :)

Regards,

Alexander
--
 ---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------