Re: Hot (MARC) metadata!

From: Rinne, Nathan (ESC) <RinneN_at_nyob> Date: Tue, 7 Aug 2007 10:47:20 -0500 To: NGC4LIB_at_listserv.nd.edu

Will,

OK, my last post for the day - and I think for at least 3 weeks. (sorry
Alexander, if you were actually looking forward to a response :) ).

Thank you so much for the detailed description of what you do - it is
very helpful, and I am filing this one away.  Really appreciated.

You're right about conflating AI and RDF - I should not do that (yes, I
could do more reading here, by the way).  So just change my post to the
"salvific claims of technology" :)

As for the rest, great stuff - yes, I understand the advantages here you
are speaking of (this stuff should be utilized more and more to help
serve the patron, as I've pointed out Mann has mentioned on many and
occasion) and I think there are efforts happening right now with RDA in
order to make this feasible for libraries too.

Finally, you close:

I don't really understand why catalogers that are obsessed with subject
headings and strict cataloging rules don't immediately jump on RDF,
topic maps or something similar, since it takes everything you like
about your metadata and takes it to the next level.  I think this is
where the claims of dogmatism come in. (end).

The only "strict cataloging rules" I am obsessed about are the need for
vocabulary consistency (agreement - which I suppose must translate to
some degree of bureaucracy), the importance of detailed, descriptive
headings (which means experts), and the corresponding ability, in
theory, to see "everything" on a particular topic in the context of
things like it (the "whole elephant") (which means context, including
"interdisciplinarity").  That is it.  If this is dogmatism, I suppose I
am dogmatic.

Maybe I am wrong, but I don't have the confidence that other folks on
this list really understand my concerns.  I would love to be proven
wrong though, although I am not even sure at this point where to start
in such a discussion.

Having just read Eric Lease Morgan's last post however, I am not even
really sure people even think that discussion is really worth having.
Let these treasures of the past go... let the brave new world come.  In
my mind, technology is loosing its proper place (I never thought I or
Mann was anti-tech! until...), and we have become drunk with it, to go
along with our being obsessed with the "instrumentalization" of all
knowledge (which after all, will also *soon* be able to be downloaded
unto a computer)  :)

Done fighting... for now. :)

Regards,
Nathan Rinne
Media Cataloging Technician
ISD 279 - Educational Service Center (ESC)
11200 93rd Ave. North
Maple Grove, MN. 55369
Work phone: 763-391-7183

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Will Kurt
Sent: Tuesday, August 07, 2007 9:51 AM
To: NGC4LIB_at_listserv.nd.edu
Subject: Re: [NGC4LIB] Hot (MARC) metadata!

"I get really skeptical about what I consider the rather salvific claims
of
  AI.  I know I'm not the only one (again, I don't want to minimize real
RDF advances on the ground, but don't see why controlled subject
  headings, for example, should not be a prime consideration here, in
some
  sense)."

Nathan I get the sense that you could do a little more reading up on
basic semantic technologies, and I'm not entirely sure why you keep
brining up AI (at least in the 'strong AI' terms you seem to be
describing) into the discussion about RDF, Topic Maps etc.

Machine readable metadata doesn't mean AI (at least not in 'the
upload our brains into a computer' sense) , it means it's easier for
computer programs to parse out data that makes sense to humans, MARC
is itself a machine readable format, it's not difficult for a program
to figure out where the subject heading is, and where the title
is.  (I'll explain where MARC falls short in a second).  The problem
is that current web metadata is very difficult to parse out and make
sense of, let me give you a concrete example:

I run a library job-site for Massachusetts that aggregates all of the
job posting from the major library job sites in the area.  One way
for me to get these job postings is to manually go to each page once
a week, and copy and paste the data into my site, this can go from
being excessively time consuming to essentially impossible given
other responsibilities in my life.

The next thing I can do (which is what I, in fact, do) is to find a
way to extract the data from the other job sites.  But how do you
write a program that finds job postings? You can't just tell the
computer to grab the jobs.  So what you have to do is examine the
underlying structure of the HTML on the site (which tells you
absolutely nothing about the content or at the most very little). So
for example I have to say 'Okay in every third table the second row
is a job title', this is also time consuming but once you're done you
can automatically extract data very easily. But unfortunately if the
essential structure of the page is changed at all everything breaks.

If these jobsites used RDF or other semantic metadata I could just
look up how they described jobs (or even better it would already be a
standard) and I literary could tell the program 'grab all the job
titles, then the descriptions'.  No magic AI involved all that
information would be contained in the metadata. (this is part of why
so many music applications can use musicbrainz so easily)

Okay but that's just the tip of the ice berg, and something that can
also be done by APIs. Another important thing in semantic metadata is
that it maps out the relationship among other entities described in
the metadata.  For example each MARC record literary contains the
text of the LC subject heading, an RDF document would simply point to
the entity of the Subject heading, which it self could point towards
other things.

So in my job site example, suppose a job was posted by a local
university, this would be contained in the metadata and I could
literary tell a program to simply go to get the university info as
well, and it's very imaginable that the university would link to
other valuable metadata etc.  All of this info would be very, very
easy to extract, and simple decision trees in a program could yield
amazing results.

I don't really understand why catalogers that are obsessed with
subject headings and strict cataloging rules don't immediately jump
on RDF, topic maps or something similar, since it takes everything
you like about your metadata and takes it to the next level.  I think
this is where the claims of dogmatism come in.

--Will

At 08:51 AM 8/7/2007, you wrote:
>Alexander:
>
>"No more databases, search engines and browsing subject headings ; I
>want to tap into more human knowledge, and for that there's blogs,
>comments, podcasts and the *content* of books. So. What do we do next?"
>(end)
>
>Alexander, I hope to be able to give you a more substantial response
>myself to your piece later on this week, but for now, let me just say I
>get really skeptical about what I consider the rather salvific claims
of
>AI.  I know I'm not the only one (again, I don't want to minimize real
>RDF advances on the ground, but don't see why controlled subject
>headings, for example, should not be a prime consideration here, in
some
>sense).
>
>No, I'm pretty skeptical.  After all, some people *really* want to "tap
>into human knowledge".  When Marvin Minsky, who heads up the Media Lab
>from ever-so-reputable MIT, suggests that increases in AI will soon
>allow us to upload the minds of the worlds billions, running them on a
>computer that costs only a few hundred dollars - and thereby solving
the
>world's population problem, I balk just a little bit.  :)
>
>Yikes.  Give me Thomas Mann and Martha Yee any day (yes - straw man
>argument - or not? :) ) - at least they seem to be talking about things
>that are real.  Go ahead and accuse me of a lack of imagination!
>
>Regards,
>Nathan Rinne
>Media Cataloging Technician
>ISD 279 - Educational Service Center (ESC)
>11200 93rd Ave. North
>Maple Grove, MN. 55369
>Work phone: 763-391-7183