Re: Libraries & the Web--was Down and The Shaft

From: Ted P Gemberling <tgemberl_at_nyob> Date: Thu, 17 May 2007 18:14:03 -0500 To: NGC4LIB_at_listserv.nd.edu

Ross,
Thanks for describing your system. I'd say anything that helps people
find what they want is good. You can probably tell I'm more concerned
with vocabulary than systems. I'm open to different kinds of software,
markup languages, etc. My main concern is threats to controlled
vocabulary. Controlled vocabulary is important because there is a
certain arbitrariness that is inescapable, I think, in any subject
indexing system. And that arbitrariness is something we should accept
rather than trying to do away with. (Let's hold onto the Second Order!)
Here's an excerpt from something I wrote a few months ago about that,
related to an automatic name disambiguation system called MathSciNet:

"Keep in mind that this still requires somebody to assign subject
headings or classifications. As the MathSciNet article says, it's their
editors who assign them. The computer program can't begin its comparison
process until that element is in place. I think there's a serious
question whether that could be automated because of the "conventional"
and "arbitrary" nature of languages. Both the English language and MeSH
[Medical Subject Headings] are conventional in the sense that the terms
mean what they mean mainly because we agree that they do. There's
usually little inherent in the terms themselves that can tell us what
they mean: they're not similar to their referents for the most part, or
always used in the presence of those referents.

"And something like MeSH also has to be very "arbitrary," in the sense
that it divides a huge continuum of ideas and things into a manageably
small group of concepts. So I kind of doubt a computer program could
reliably interpret keywords in terms of subjects. Maybe not impossible,
but doubtful for the present. The process would have to involve, not
just matching character strings in files, but interpreting the
relationships between words in the files. It's possible a computer
couldn't do that without actually being able to understand language. I
don't believe a computer system has been developed so far that does that
very successfully.

"One problem is that we humans have a fundamentally different sort of
relationship to words than computers have: we're not just doing
computations with them, but actually interacting with their referents in
the real world. We see them, don't just think of the words for them. And
that allows us to use language metaphorically: the metaphoric uses
depend on the relationships between one referent and another, not just
one word and another. Metaphor allows us to extend the meanings of words
and phrases beyond their ordinary meanings. I doubt that a computer
could understand metaphorical language, though I'm not sure how much
automatic assignment of subject terms in scientific or medical fields
would depend on that. It probably would depend on it in the humanities.

"Think of it this way. We know computers can find character strings very
well in large bodies of data, in fact faster and more accurately than we
can. But expecting them to assign subject headings would be sort of like
expecting them, not just to scan, but create that data. In fact, it
would be like expecting them to write our articles for us. I'm not
saying it's impossible, but presently it only happens in science
fiction. Or at least that's the way it seems to me now-I'm open to
correction if there's evidence that isn't true."
        --Ted Gemberling

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Ross Singer
Sent: Tuesday, May 15, 2007 7:24 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Libraries & the Web--was Down and The Shaft

At Georgia Tech, we, like basically everybody else at this point, are
building our Next Generation "catalog".  We call it the "Communicat".

We are building it on top of the Daisy CMS.  We've chosen Daisy
because it allows you to define schemas for document types and each
document can contain many parts, fields or include other documents.

Right now we've got two general types of documents:  Pages and
Components.

Pages are the display layer and combine zero or many components.  They
include editable regions and the ability to include syndicated feeds
(TOC, say).  We can grab descriptions, link to Wikipedia or reviews in
Revish or OpenWorldcat at the Page level.

Components are generally non-(directly)-displaying and generally
include metadata derived from authoritative metadata records.  Here we
include MARC records, EAD, DC, ONIX, etc.  These records inform the
Page record on basically how it should display.

There are three 'realms' in which Page records can reside:  core,
community and world.  The Communicat is intended to allow users and
groups to be able to build their own collections (via social
bookmarking) and then scope their searches based on items in their
groups.

"Core" items are records that viewed as "official Georgia Tech
assets":  most likely seeded from the library's collections, but there
are other repositories to draw from across campus.

"Community" items are added by students, faculty and staff of Georgia
Tech, either through courseware, registered social bookmarking sites
(such as Connotea or del.icio.us), or by using GaTher, which is the
Communicat's citation manager/social bookmarker.

"World" items are those added by people that aren't associated with
Georgia Tech (it doesn't discriminate to facilitate cross
institutional scholarship).

The components don't belong to a realm since they can only be modified
by whoever added them and are intended to be used in pages.  Most
people won't realize they even exist.

We had hoped to roll out the core collection to coincide with our site
redesign last week, but some snags related to processing the MARC
records to go into it will probably push that back into June sometime.

I guess my point with all this is, we aren't implementing this so that
our users can tag our government documents and conferences (although,
that will be a side-effect), but so that anyone, mainly reference
librarians or faculty most likely, can make relationships between
novels and film adaptions or links to critical reviews or, yes, the
IMdb record.  It's an attempt to make the collection organic, rather
than a tightly controlled silo.

-Ross.

On 5/15/07, Ted P Gemberling <tgemberl_at_uab.edu> wrote:
> Ross,
> Of course Mann is not the only authority on library science. He isn't
> always right. Not in details, but I think in broad concepts he has
been,
> as far as I've seen. But I'm not refusing to listen to others.
>
> Actually, I don't believe there are any authorities in any subject.
> There are just people we get insight from to varying degrees. I have
to
> admit I've gotten a lot from Thomas Mann and not much from a lot of
> others.
>
> Now, when you say it's cheap to allow users to make the relationship
> between De list and The shaft, I would say it depends on what the
impact
> of that will be. It's not cheap if it's something a cataloger has to
> intervene in, in some way. If it's "tagging" or "folksonomies" of some
> sort that are not put on the MARC record, are just part of some user's
> "my catalog," then it's cheap as soon as the system for linking it to
> titles has been set up. In that way, "controlled, authoritative
metadata
> lives in harmony with community contributed data" (or maybe more
> correctly, "personally contributed data").
>
> But there has to be a realm where the "community" can't contribute
data.
> They shouldn't contribute data to the MARC record, at least not
> directly. That has to be a realm that is controlled and follows strict
> standards. Those standards exist partly to keep costs down, and partly
> to make the information as useful to as many people as possible.
That's
> because libraries are public institutions.
>         --Ted Gemberling
>
>
> Not an official statement of the UAB Lister Hill Library
>
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Ross Singer
> Sent: Tuesday, May 15, 2007 2:22 PM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] Libraries & the Web--was Down and The Shaft
>
> But I think the point is is that it's "cheap" to allow our users to
> make relationships between "De Lift" and "The Shaft" if they find it
> valuable.
>
> There is no reason that controlled, authoritative metadata can't live
> in harmony with community contributed data.
>
> Thomas Mann is not the sole voice or opinion on the theories of
> information science.
>
> -Ross.
>
> On 5/15/07, Ted P Gemberling <tgemberl_at_uab.edu> wrote:
> > Thomas,
> >
> > Thanks. I don't want to irritate people by saying anything more in
> > detail about that movie, but I'd like to comment briefly on this
> > question you raised. I'll get philosophical later, and of course no
> one
> > has to read that if they're not interested.
> >
> > You wrote:
> >
> > "This information about "De Lift" came from the review attached to
> > the IMDb record--a plus for user-supplied social networking data. An
> > interesting question arises. Say that the cataloguers would always
> miss
> > that data, but a user found this relationship and supplied it. Would
> > someone then make a more formal link (a 730 field or equivalent) in
a
> > catalogue record in some sort of global next gen catalogue? Or is
that
> > title information just loose data that other end-users could make
use
> of
> > if necessary? I would say that information from the end-user should
be
> a
> > trigger for more formal use of data fields with the final decision
> made
> > by librarians ..."
> >
> > It strikes me that we can get into trouble trying to tie the entire
> > information world together. I'd opt for the "just loose data" answer
> to
> > the question. I appreciate the fact that IMDb gives us this
> information
> > about the relation to De lift. But I doubt that it's worth the time,
> > money, and effort of libraries (or the "library world") to show what
> > this rather obscure 2001 film is a remake of.
> >
> > Now, if "The shaft" came to be regarded as some sort of classic of
> > cinema art, and scholars around the world were studying it as such,
> > interest in De lift and its relation to The shaft would increase. I
> > think there are pairs of films like that, though I can't remember
any
> > examples right now. It might then make sense to add the title added
> > entry to cataloging records.
> >
> > My philosophical point:
> >
> > What I said above relates to a perspective I came to after reading
> > Thomas Mann. I think we need to recognize that the Web and library
> > catalogs have different purposes. They are both valuable, but we
> > shouldn't think their values are the same. We shouldn't have to
> catalog
> > the whole Web or show relationships between all kinds of
information.
> > And I think the reason some people (not Thomas Brenndorfer) want to
> > discontinue things we've been doing is that they want to catalog
> almost
> > everything, which means they can't catalog anything in much detail.
> >
> > Here's a stab at how we might distinguish the purposes of libraries
> and
> > the Web. I think libraries, as public institutions, are in the
> business
> > of preserving information that the public (or maybe better, the
"body
> > politic") has decided is important. The things which are necessary
for
> > education, research, public safety, and other concerns. That isn't
> > really contradicted by public libraries' fiction sections, because
> they
> > just show that the "body politic" has decided it's important to
> provide
> > entertainment, too. Nor is it contradicted by some libraries being
> > privately owned, because even if they're private--unless they're
just
> > "libraries" in people's homes--they have to reflect "public"
concerns
> to
> > some extent. Otherwise no one will use them.
> >
> > In contrast, the Web is centered on the interests of individuals. It
> is
> > often, in Thomas Brenndorfer's terms, "loose data." It is the realm
of
> > freedom and personal preference, and somewhat of chaos. Great sites
> like
> > IMDb or Google exist because people want to look for things outside
> what
> > is provided by the public institution of libraries. If you're a film
> > buff like me, you won't be satisfied by what libraries can give you.
> And
> > we wouldn't want to make libraries tell us everything about movies.
At
> > least not most libraries.
> >
> > This isn't to say you can't publish things, even "serious" things
like
> > electronic journals, on the Web. Though the "serious" ones are more
> > likely to come with a price. Maybe I should say the Web is a realm
> that
> > contains both "raw" and "controlled" data, and librarians select
> > strictly from the things they've decided are important.
> >
> > On the Web, it's questionable that one really has an inalienable
right
> > to anything. I'm sympathetic to "Net Neutrality," but I wonder if we
> > might have to realize that as an entity that exists for individuals'
> > whims and interests, the Internet may not be able to provide equal
> > access to everybody. That may be another important purpose of
> libraries,
> > to provide a place where individuals who can't afford fast access to
> it
> > at home can get it. But capitalism may hold sway on the Web, as in
> most
> > forms of publishing.
> >
> > Here's an example of the value of "loose data." I catalog 19th
century
> > books, and many of them have signatures that are pretty illegible.
> > Sometimes I can only guess at how to read people's handwriting.
Google
> > is a terrific source for deciphering the signatures at times. LC's
> Name
> > Authority File can help somewhat, but it's a lot farther from
> containing
> > every personal name that has ever existed than Google. On Google, I
> can
> > try different possible readings of the names and see which ones have
> > matches. After I do that, I may go to the NAF to see if there's a
> > corresponding heading.
> >
> > As a library cataloger, my job is to translate that "loose data"
into
> > something that isn't "loose." Of course established headings
exemplify
> > "non-looseness." When something goes from the realm of the private
to
> > the public, looseness has to stop for the most part. Transcriptional
> > fields like the 246 are looser, but even they are governed by some
> > strict rules.
> >         --Ted Gemberling
> >
> > Not an official statement of the UAB Lister Hill Library
> >
> >
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Brenndorfer, Thomas
> > Sent: Monday, May 14, 2007 3:31 PM
> > To: NGC4LIB_at_LISTSERV.ND.EDU
> > Subject: [NGC4LIB] Down and The Shaft
> >
> > I found a record for "The shaft" DVD at Vancouver Public Library.
The
> > catalogue record had also
> >
> >
> >
> > 246 $iOriginal title:$aDown
> >
> >
> >
> > I suppose as a remake, "De Lift" should be a related work heading on
> any
> > record for Down/The shaft. A work-to-work relationship in FRBR
> terms-not
> > one work issued under two different titles.
> >
> >
> >
> > This information about "De Lift" came out from the review attached
to
> > the IMDb record-a plus for user-supplied social networking data. An
> > interesting question arises. Say that the cataloguers would always
> miss
> > that data, but a user found this relationship and supplied it. Would
> > someone then make a more formal link (a 730 field or equivalent) in
a
> > catalogue record in some sort of global next gen catalogue? Or is
that
> > title information just loose data that other end-users could make
use
> of
> > if necessary? I would say that information from the end-user should
be
> a
> > trigger for more formal use of data fields with the final decision
> made
> > by librarians. If the fields exist and these are the FRBR entities
and
> > relationships of concern for bibliographic control, then the data
> should
> > be filled in correctly in the next gen catalogue.
> >
> >
> >
> > >>Thomas, where did you get the information that "Down" was the
> original
> > title? Maybe that's buried somewhere on IMDb, but it's not something
> an
> > average moviegoer would know, I think. IMDb says it's a remake of a
> 1983
> > Dutch film, De lift.
> >
> >
> >
> >
> >
> > Thomas Brenndorfer, B.A, M.L.I.S.
> >
> > Guelph Public Library
> >
> > 100 Norfolk St.
> >
> > Guelph, ON
> >
> > N1H 4J6
> >
> > (519) 824-6220 ext. 276
> >
> > tbrenndorfer_at_library.guelph.on.ca
> >
>