Re: Hot (MARC) metadata!

From: Will Kurt <wkurt_at_nyob> Date: Tue, 7 Aug 2007 10:51:14 -0400 To: NGC4LIB_at_listserv.nd.edu

"I get really skeptical about what I consider the rather salvific claims of
  AI.  I know I'm not the only one (again, I don't want to minimize real
RDF advances on the ground, but don't see why controlled subject
  headings, for example, should not be a prime consideration here, in some
  sense)."

Nathan I get the sense that you could do a little more reading up on
basic semantic technologies, and I'm not entirely sure why you keep
brining up AI (at least in the 'strong AI' terms you seem to be
describing) into the discussion about RDF, Topic Maps etc.

Machine readable metadata doesn't mean AI (at least not in 'the
upload our brains into a computer' sense) , it means it's easier for
computer programs to parse out data that makes sense to humans, MARC
is itself a machine readable format, it's not difficult for a program
to figure out where the subject heading is, and where the title
is.  (I'll explain where MARC falls short in a second).  The problem
is that current web metadata is very difficult to parse out and make
sense of, let me give you a concrete example:

I run a library job-site for Massachusetts that aggregates all of the
job posting from the major library job sites in the area.  One way
for me to get these job postings is to manually go to each page once
a week, and copy and paste the data into my site, this can go from
being excessively time consuming to essentially impossible given
other responsibilities in my life.

The next thing I can do (which is what I, in fact, do) is to find a
way to extract the data from the other job sites.  But how do you
write a program that finds job postings? You can't just tell the
computer to grab the jobs.  So what you have to do is examine the
underlying structure of the HTML on the site (which tells you
absolutely nothing about the content or at the most very little). So
for example I have to say 'Okay in every third table the second row
is a job title', this is also time consuming but once you're done you
can automatically extract data very easily. But unfortunately if the
essential structure of the page is changed at all everything breaks.

If these jobsites used RDF or other semantic metadata I could just
look up how they described jobs (or even better it would already be a
standard) and I literary could tell the program 'grab all the job
titles, then the descriptions'.  No magic AI involved all that
information would be contained in the metadata. (this is part of why
so many music applications can use musicbrainz so easily)

Okay but that's just the tip of the ice berg, and something that can
also be done by APIs. Another important thing in semantic metadata is
that it maps out the relationship among other entities described in
the metadata.  For example each MARC record literary contains the
text of the LC subject heading, an RDF document would simply point to
the entity of the Subject heading, which it self could point towards
other things.

So in my job site example, suppose a job was posted by a local
university, this would be contained in the metadata and I could
literary tell a program to simply go to get the university info as
well, and it's very imaginable that the university would link to
other valuable metadata etc.  All of this info would be very, very
easy to extract, and simple decision trees in a program could yield
amazing results.

I don't really understand why catalogers that are obsessed with
subject headings and strict cataloging rules don't immediately jump
on RDF, topic maps or something similar, since it takes everything
you like about your metadata and takes it to the next level.  I think
this is where the claims of dogmatism come in.

--Will

At 08:51 AM 8/7/2007, you wrote:
>Alexander:
>
>"No more databases, search engines and browsing subject headings ; I
>want to tap into more human knowledge, and for that there's blogs,
>comments, podcasts and the *content* of books. So. What do we do next?"
>(end)
>
>Alexander, I hope to be able to give you a more substantial response
>myself to your piece later on this week, but for now, let me just say I
>get really skeptical about what I consider the rather salvific claims of
>AI.  I know I'm not the only one (again, I don't want to minimize real
>RDF advances on the ground, but don't see why controlled subject
>headings, for example, should not be a prime consideration here, in some
>sense).
>
>No, I'm pretty skeptical.  After all, some people *really* want to "tap
>into human knowledge".  When Marvin Minsky, who heads up the Media Lab
>from ever-so-reputable MIT, suggests that increases in AI will soon
>allow us to upload the minds of the worlds billions, running them on a
>computer that costs only a few hundred dollars - and thereby solving the
>world's population problem, I balk just a little bit.  :)
>
>Yikes.  Give me Thomas Mann and Martha Yee any day (yes - straw man
>argument - or not? :) ) - at least they seem to be talking about things
>that are real.  Go ahead and accuse me of a lack of imagination!
>
>Regards,
>Nathan Rinne
>Media Cataloging Technician
>ISD 279 - Educational Service Center (ESC)
>11200 93rd Ave. North
>Maple Grove, MN. 55369
>Work phone: 763-391-7183