Re: Hot (MARC) metadata!

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Tue, 7 Aug 2007 14:34:15 +1000 To: NGC4LIB_at_listserv.nd.edu

Hi all,

First of all, I'm rather disappointed that this thread has died ; I
think I meant it as a litmus test for where we librarians (or
wannabes) think the juice of our future systems really sit. If we
ourselves can't point to the fantastic metadata our systems should /
would / could use I suspect we're already dead.

Of course, I'm not saying we all must understand MARC to the fullest ;
we're not all catalogers, but we should at least be able to say that
"title + author + subject headings combined" is what we've got so far,
and that, perhaps, our subject headings are indeed what sets us apart
from Amazon, Google, et co. (Or something else; this was an example)

If nothing else, I fear this all points to the fact that more and more
librarians see ; we don't know what we're doing in the digital domain,
and we don't know what the remedy might even look like. Some still say
that in the digital world we play a part as we are right now, in the
shape of people who knows something loosely about rectangular
paper-based objects.

My purpose these days, as said before, is about serious analysis of
what we've got instead of the current real-time search / browse
indexing we're mostly doing ; we need to come up with data models that
are better than what currently exist, Google or otherwise. We need to
put that fantastic human knowledge we have into a computer-based
framework, either to mimic or enhance that we currently got in human
form. How can we take that human knowledge and make it available to
all through computer interfaces? That really is what this list is all
about, but I see far less of its applicability than I see whining
about what is currently wrong (and boy am I guilty of this latter
one).

Anyway, some more specifics ;

On 8/1/07, Rinne, Nathan (ESC) <RinneN_at_district279.org> wrote:
> There is much we agree on here - to my reading however, you have more
> faith in technology to understand human language than I do.

Perhaps, but I did spend 8 years of my life in the high-end artificial
intelligence business as a R&D developer where I did a lot of stuff
that truly rocks if given the right models and cooperation from the
data (in our case, the people).

>> " So, for us they are great tools, but for normal sane people they can
>> be a huge constraint."
>
> This is why I make the analogy between a librarian and a doctor.  Each
> has specialized tools, technological, physical, mental, etc. (sometimes
> costly) that treat "rare conditions" (for libs, in the case of curious
> scholars who want tools that can help them dig very deep) - we don't
> expect everyone to be perform specialized surgery on themselves, so why
> should it be that different here?  Granted - we do need to make
> "surgery" easier for our users who want to attempt it - which is many.
> We must!

Indeed. But allow me to take your analogy a bit further ; whenever me
or my family gets some kind of ailment I use a number of online tools
to find out more about it. Most of the time, depending on the
seriousness of the ailment, I can replace the doctor, other times
(more on the once in a while side) I go to the doctor (mostly for
prescriptions and confirmation that I'm roughly right). But all in
all, I go to the doctor far less than I used to. More and more
information gets wrapped in better interfaces all the time, and at
some point the threshold for a mere mortal such as myself to
understand the basics of medical practice gets low enough for me to
have confidence in my own findings. And I'm not dead yet.

Now, there's nothing new or fancy in doing this ; we all do it, read,
research, and go with our current understanding to get to the info
we're after. Why should the library world be any different? And the
key to making people better understand the value of this profession is
to make interfaces they understand. it's not that hard, but it does
take a brave step on our side to open up.

> Ok, so as the devils advocate ; *why* should it [the library] survive?
> With all these technologies and possibilities, what do the human
> librarian add to the future of knowledge management?... What if
> [librarians] are irrelevant?... computers, software and algorithms are
> going to be increasingly clever at finding, cataloging and deliver
> stuff... Tools that help us catalog better ; that's what you need to
> attack, not... those who recognize that human-maintained systems
> are on the path to doom and destruction for the library profession. (end)
>
> Re: the doom and destruction born of an over-reliance on
> human-maintained systems, is it that obvious to you, and if so, why?

Yes, it's very obvious ; information librarians should handle far
outgrow us, and we're certainly not funded to do that job. The
mountain of information (including books) are far beyond what we're
able to cope with, even with copy cataloging ; the time we have
available to deal with one book is always getting less and less, which
means the quality and intellectual effort put into cataloging pr. book
goes down.

That's just the practical side of things. Next is what people
increasingly search for. Our search patterns change, just as Google
and others surpass us in technology. In the past we might have gone to
the librarian, she/he suggest a subject heading, and we browse through
the catalog in search of what we're after. With Google Books they
search the contents of the book itself (using clever algorithms to
make sure the result is relevant) and often provide a preview of that
content. How on earth can titles and subject headings compete with
that?

In a different thread someone pointed out a search for "sarjent" as an
actual spelling of "sergeant" in an older book. The chances the book
would be cataloged with "sarjent" as opposed to modern English is
minuscule; Google wins, hands down. It's not hard to do cataloging
across free-text documents through simple statistics. Google already
do this to some extent, and there's nothing indicating they'll get
worse at it over the next few years.

So, what *we* are doing is doomed ; not enough people, too much info
for any one person to deal with (even within a narrow subject field),
more and more info, more and more people out there (both creating more
and demanding more), and - perhaps the crux - is that our cataloging
is very limited, prone to human error and / or bad moods.

> Please see Bernhard Eversberg's response to Wayne Jone's email re:
> Martha Yee's paper, as I think he sums up my thoughts well

I'll quote something interesting from that which I think symbolises
what I reckon is where library-world is going wrong ;

> Yee writes: "A computer cannot discover broader and narrower term
> relationships, part-whole relationships, work-edition relationships,
> variant term or name relationships (the synonym or variant name or
> title problem), or the homonym problem in which the same string of
> letters means different concepts or refers to different authors or different
> works." That's true, of course, computers can't do that all by themselves,
> but human beings can program them to achieve that or a practical,
> semi-fabulous approximation.

Actually, a computer (with a bunch of clever developers in the
background) can do lots of this stuff. It's not even that hard, but
does take a bit of time. I'd like to point out the perfect example of
this old-school vs. new-school ways of doing things. Google wanted to
do a translation. One can either do it like :

a) Create complex software that knows about and analyzes a language's
grammar and dictionary, with spell-checking, slang databases, names
databases, and create enormous data models to try to work out what is
being written. Then pass all of this data into a equally complex model
for the target language, and try to decompose it to that language.
Most translation services work this way, and they're hopeless.

b) Take the EU / UN translations, index it all, and find correlations
between the languages you want translated, and statistically display
them as you go.

No need for me to say what Google did, and needless to say, the Google
way not only do a super-stellar job but seriously shook the
translation services world as well.

Now, the key here is of course that Google took a human corpus and
found a way to harvest it and use it for the purpose in mind. Let me
underline the important part ; _human_corpus_. We will not be
successful unless we create automatic systems that do this. Title +
author + subject heading is not enough. In fact, our metadata *fields*
are not enough. Some combinations might prove effective, but will
probably not be successful, certainly not enough to save the library
profession. We need to put more effort into creating a human corpus of
library knowledge that our systems can harvest.

In the not so distant past I whined about the fact that when we do
match / merge of MARC records (just admit it, you all do it :) the
information about who changed what part of the merged fields are lost
in the process (proper change data), we lose precious (human) data
that we could harvest for value. Nobody seemed to understand what I
was talking about then, and I doubt many see it still, which is sad,
sad, sad.

> > "1)there's going to be *seriously* more information available, much more
> > than any human or human-maintained system can keep up with"
>
> I agree - and that's why I think Thomas Mann is right on with his "niche
> strategy".  Have you read his paper yet?  Talk about "hot metadata" (but
> really, as he says, people often don't realize how hot it is until
> they've been introduced to it...)!

Yes, I've read it, and I think he's dead wrong ; he's a librarian
fighting for the librarian way, which, as far as I can see, hasn't
really changed much since "computer" entered the dictionary. He talks
indeed about all that stuff that can't possibly work with computers
without providing any proof that he's right. He criticizes the LoC
report for being too technology focused without understanding that
he's been surpassed by technology himself.

Simply put, what does the librarian profession do these days? I mean,
apart from feeding the genealogists? Less and less people come into
the library. Less and less people ask us for help. Less and less
people are after truly specifics (there's your niches) as even the
experts are put into question. Education and knowledge expands faster
than what this profession is willing to put up with or deal with.

> This does not preclude digitization, user tags, working closely together
> with other metadata communities, adopting new rules and formats, etc. -
> it just means that the library community shouldn't sell its "birthright"
> [...] for a bowl of porridge.

Why not? If the library won't change direction and is heading for its
doom, why not get a bowl of yummy porridge? I like porridge. :)

> (primarily, its highly detailed subject headings which though not
> perfect, really do represent knowledge gained by those who have gone
> before us by doing hard work, born of curiosity and wonder, making
> contact with the world out there, according to the best practices of
> their various disciplines - as there are catalogers even who specialize
> in these disciplines)

Yes, I'm getting more and more afraid that our subject headings is the
only thing we regard as valuable in our meta data sets. As such, they
are not good enough to just index.

> Of course, when the bar has been lowered so much in public education
> (try getting a good, well-rounded liberal arts education today), I admit
> my arguments sound less and less cogent.

The view on not only education but *how* people these days learn and
use their education certainly has for me changed the way I develop
systems. No more databases, search engines and browsing subject
headings ; I want to tap into more human knowledge, and for that
there's blogs, comments, podcasts and the *content* of books. So. What
do we do next?

Alex
--
 ---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------