Re: Hot (MARC) metadata!

From: Ross Singer <ross.singer_at_nyob>
Date: Tue, 7 Aug 2007 12:36:50 -0400
To: NGC4LIB_at_listserv.nd.edu
Nathan,

I can't help but to think that you're missing the point here.

We have a lot of data that was painstakingly produced to help /human
beings/ search and discover information with little help from
machines.  As the information landscape grows, this method becomes
more and more inefficient since it puts more and more of a burden on
the human to understand and navigate the information s/he receives.

I think what Alexander and Will are trying to get at is that if we
modeled and defined our data differently, we could push a lot of this
data to the computer to make relationships between things or guide the
user through the hoops and allowing our users to spend their energy
and resources on working out the bits that computers /can't/ do, such
as determining value of a resource to the user's need.

I think the crux of this isn't a value judgement of LCSH, it's whether
or not LCSH (among other parts of our data), /while maintaining it's
current level of sophistication/, could be modeled in such a way that
machines could take some of the burden off of the searcher.

Our data is not even remotely optimized for the current state of
technology.  There is no leap of faith, this is not a question of
salvation through machinery.  It's the simple fact that technology
/exists/, would be useful to our cause, and we're not even remotely
utilizing it efficiently.

-Ross.

On 8/7/07, Rinne, Nathan (ESC) <RinneN_at_district279.org> wrote:
> Will,
>
> OK, my last post for the day - and I think for at least 3 weeks. (sorry
> Alexander, if you were actually looking forward to a response :) ).
>
> Thank you so much for the detailed description of what you do - it is
> very helpful, and I am filing this one away.  Really appreciated.
>
> You're right about conflating AI and RDF - I should not do that (yes, I
> could do more reading here, by the way).  So just change my post to the
> "salvific claims of technology" :)
>
> As for the rest, great stuff - yes, I understand the advantages here you
> are speaking of (this stuff should be utilized more and more to help
> serve the patron, as I've pointed out Mann has mentioned on many and
> occasion) and I think there are efforts happening right now with RDA in
> order to make this feasible for libraries too.
>
> Finally, you close:
>
> I don't really understand why catalogers that are obsessed with subject
> headings and strict cataloging rules don't immediately jump on RDF,
> topic maps or something similar, since it takes everything you like
> about your metadata and takes it to the next level.  I think this is
> where the claims of dogmatism come in. (end).
>
> The only "strict cataloging rules" I am obsessed about are the need for
> vocabulary consistency (agreement - which I suppose must translate to
> some degree of bureaucracy), the importance of detailed, descriptive
> headings (which means experts), and the corresponding ability, in
> theory, to see "everything" on a particular topic in the context of
> things like it (the "whole elephant") (which means context, including
> "interdisciplinarity").  That is it.  If this is dogmatism, I suppose I
> am dogmatic.
>
> Maybe I am wrong, but I don't have the confidence that other folks on
> this list really understand my concerns.  I would love to be proven
> wrong though, although I am not even sure at this point where to start
> in such a discussion.
>
> Having just read Eric Lease Morgan's last post however, I am not even
> really sure people even think that discussion is really worth having.
> Let these treasures of the past go... let the brave new world come.  In
> my mind, technology is loosing its proper place (I never thought I or
> Mann was anti-tech! until...), and we have become drunk with it, to go
> along with our being obsessed with the "instrumentalization" of all
> knowledge (which after all, will also *soon* be able to be downloaded
> unto a computer)  :)
>
> Done fighting... for now. :)
>
> Regards,
> Nathan Rinne
> Media Cataloging Technician
> ISD 279 - Educational Service Center (ESC)
> 11200 93rd Ave. North
> Maple Grove, MN. 55369
> Work phone: 763-391-7183
>
>
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Will Kurt
> Sent: Tuesday, August 07, 2007 9:51 AM
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Hot (MARC) metadata!
>
> "I get really skeptical about what I consider the rather salvific claims
> of
>   AI.  I know I'm not the only one (again, I don't want to minimize real
> RDF advances on the ground, but don't see why controlled subject
>   headings, for example, should not be a prime consideration here, in
> some
>   sense)."
>
> Nathan I get the sense that you could do a little more reading up on
> basic semantic technologies, and I'm not entirely sure why you keep
> brining up AI (at least in the 'strong AI' terms you seem to be
> describing) into the discussion about RDF, Topic Maps etc.
>
> Machine readable metadata doesn't mean AI (at least not in 'the
> upload our brains into a computer' sense) , it means it's easier for
> computer programs to parse out data that makes sense to humans, MARC
> is itself a machine readable format, it's not difficult for a program
> to figure out where the subject heading is, and where the title
> is.  (I'll explain where MARC falls short in a second).  The problem
> is that current web metadata is very difficult to parse out and make
> sense of, let me give you a concrete example:
>
> I run a library job-site for Massachusetts that aggregates all of the
> job posting from the major library job sites in the area.  One way
> for me to get these job postings is to manually go to each page once
> a week, and copy and paste the data into my site, this can go from
> being excessively time consuming to essentially impossible given
> other responsibilities in my life.
>
> The next thing I can do (which is what I, in fact, do) is to find a
> way to extract the data from the other job sites.  But how do you
> write a program that finds job postings? You can't just tell the
> computer to grab the jobs.  So what you have to do is examine the
> underlying structure of the HTML on the site (which tells you
> absolutely nothing about the content or at the most very little). So
> for example I have to say 'Okay in every third table the second row
> is a job title', this is also time consuming but once you're done you
> can automatically extract data very easily. But unfortunately if the
> essential structure of the page is changed at all everything breaks.
>
> If these jobsites used RDF or other semantic metadata I could just
> look up how they described jobs (or even better it would already be a
> standard) and I literary could tell the program 'grab all the job
> titles, then the descriptions'.  No magic AI involved all that
> information would be contained in the metadata. (this is part of why
> so many music applications can use musicbrainz so easily)
>
> Okay but that's just the tip of the ice berg, and something that can
> also be done by APIs. Another important thing in semantic metadata is
> that it maps out the relationship among other entities described in
> the metadata.  For example each MARC record literary contains the
> text of the LC subject heading, an RDF document would simply point to
> the entity of the Subject heading, which it self could point towards
> other things.
>
> So in my job site example, suppose a job was posted by a local
> university, this would be contained in the metadata and I could
> literary tell a program to simply go to get the university info as
> well, and it's very imaginable that the university would link to
> other valuable metadata etc.  All of this info would be very, very
> easy to extract, and simple decision trees in a program could yield
> amazing results.
>
> I don't really understand why catalogers that are obsessed with
> subject headings and strict cataloging rules don't immediately jump
> on RDF, topic maps or something similar, since it takes everything
> you like about your metadata and takes it to the next level.  I think
> this is where the claims of dogmatism come in.
>
> --Will
>
>
> At 08:51 AM 8/7/2007, you wrote:
> >Alexander:
> >
> >"No more databases, search engines and browsing subject headings ; I
> >want to tap into more human knowledge, and for that there's blogs,
> >comments, podcasts and the *content* of books. So. What do we do next?"
> >(end)
> >
> >Alexander, I hope to be able to give you a more substantial response
> >myself to your piece later on this week, but for now, let me just say I
> >get really skeptical about what I consider the rather salvific claims
> of
> >AI.  I know I'm not the only one (again, I don't want to minimize real
> >RDF advances on the ground, but don't see why controlled subject
> >headings, for example, should not be a prime consideration here, in
> some
> >sense).
> >
> >No, I'm pretty skeptical.  After all, some people *really* want to "tap
> >into human knowledge".  When Marvin Minsky, who heads up the Media Lab
> >from ever-so-reputable MIT, suggests that increases in AI will soon
> >allow us to upload the minds of the worlds billions, running them on a
> >computer that costs only a few hundred dollars - and thereby solving
> the
> >world's population problem, I balk just a little bit.  :)
> >
> >Yikes.  Give me Thomas Mann and Martha Yee any day (yes - straw man
> >argument - or not? :) ) - at least they seem to be talking about things
> >that are real.  Go ahead and accuse me of a lack of imagination!
> >
> >Regards,
> >Nathan Rinne
> >Media Cataloging Technician
> >ISD 279 - Educational Service Center (ESC)
> >11200 93rd Ave. North
> >Maple Grove, MN. 55369
> >Work phone: 763-391-7183
>
Received on Tue Aug 07 2007 - 10:27:02 EDT