Re: The Situation We're In (was Re: Authority maintenance )

From: Corey A Harper <corey.harper_at_nyob> Date: Wed, 30 May 2007 20:34:52 -0400 To: NGC4LIB_at_listserv.nd.edu

Ted,

I have trouble understanding why so many feel that Mann's depiction of
our current research environments represents "the way things should be".
  I'm more inclined to see the difficulty of assessing resource quality
on the web to be a symptom of a larger problem in the library approach
to organizing information.

I don't necessarily think the answers lie in "cataloging selected web
resources."  Manual collection development to produce an authoritative
and comprehensive subset of the "Scholarly Internet" seems a losing
proposition.  Instead, I think there are ways we can leverage our
_principles_ for organizing information and apply them to the Web at large.

For example, I spend a lot of time evaluating Internet resources,
especially blogs, for the purpose of my own research and general
edification.  Even in the library and general technology fields, I
cannot always quickly identify blogs that I want to track.  This is true
even though, in these fields, I can often expect to have a sense of an
author's scholarly reputation.  Outside of the areas in which I'm
reasonably well versed, I'm even less likely to be able to successfully
evaluate a resource.

I'd love it if, for example, Technorati could include data from LCNAF to
help me evaluate resources.  This is just an offhand example, but it is
an example of a case where exposing our data in more common formats is
valuable.  This would allow users to more effectively use the Web as a
source in the "complex, difficult process" that "real" research is.
They're going to the Web anyway.  Many of us turn to the Web as well,
and I like to think that my Web research is "real" research.  Why not
help make networked resources better suited to the task of research?

-Corey

Ted P Gemberling wrote:
> Jonathan,
> Thanks for your summary. More about that below.
>
> Karen wrote:
>
> "My guess is that we are going to improve our catalogs incrementally
> ..."
>
> I appreciate that. That's often the safest way to do things. It enables
> you to see the costs and benefits at each step. If you try to go too
> fast, there's a good chance you'll regret something later.
>
> "Beyond that, we also need to embrace incoming data and resources that
> differ from library standards so that we can be seen as a source of all
> information, not just "library" information."
>
> A big part of Mann's position is to emphasize that libraries and the Web
> really do have different values. No one denies that we need to catalog
> some things on the web. After all, journals are going electronic, and
> there's probably no way, and no reason, to stop that. At the very least,
> it will solve huge problems of space. The web is one media of
> publication. But libraries, especially in certain fields, are made up of
> more than journals, and libraries convey more than information. They
> convey knowledge, a higher, more integrated level of awareness. I don't
> want to annoy people with a lot more philosophical postings, but at the
> bottom I'm going to copy and paste something I posted several weeks ago
> that states my own personal take on the difference.
>
> On his blog (http://bibwild.wordpress.com/2007/05/25/broken-huh/),
> Jonathan writes:
>
> "There are very basic questions of high interest to our users that our
> data set is unable to answer, even though we are spending time recording
> information that ought to be available to answer these questions. One
> very good example-and it's just one example-is Roy Tennant's analysis of
> the inability to say whether full content is available online even
> though we are already spending time recording URL information."
>
> Now, I'm not going to say we definitely shouldn't make that crystal
> clear on our OPACs, if there's some way to do so. (I'm not an electronic
> or media cataloger, so that's kind of out of my department.) But I do
> have to ask how much of a burden that uncertainty really is on users.
> This seems to assume that as soon as you enter your search query and get
> a result set, full-text content should be immediately discernible. I
> realize it is on many electronic databases. But is that really a major
> problem for researchers? To have to click a few more times?
>
> In his publications, Thomas Mann emphasizes that real research is a
> complex, difficult process that often has to be approached from various
> angles. It takes time. And you often need training from reference
> librarians on what to look for if you're in an unfamiliar area.
>
> Having not read the Autocat postings Tennant refers to, I don't really
> know why catalog records do not indicate full text in many cases. But
> I'm guessing that it's something that wasn't regarded as important to
> the designers of the records, in comparison with other things.
>
> "The metadata system/environment we have now was very intelligently
> optimized for the social, economic, and technical context of the mid
> 20th century."
>
> I'd opine that it's, at the very least, optimized for the last decade of
> the 20th century. Personally, I think it's optimized for this decade,
> but there's absolutely no justification for claiming it's as archaic as
> the mid-20th century. A lot of advantages have come with online
> catalogs: information is accessible in many more ways today, even if the
> content that was on cards has remained constant to a certain extent.
>
> Another point I imagine someone might bring up would be
> post-coordination as a "better tool" than precoordination, since it's
> more "web friendly." The best thing I know on that topic is this piece
> by Mann from the Bibliographic Control for the New Millennium
> conference:
> http://www.loc.gov/catdir/bibcontrol/mann_paper.html
> It's long, but worth reading.
>
> Jonathan, I'll look more at your blog and the responses to it. Thanks to
> Bernhard and Alexander for their postings on this thread, too.
>         --Open-mindedly yours, Ted Gemberling
>
> Libraries and the Web (with personal references removed)
>
> Here's a stab at how we might distinguish the purposes of libraries and
> the Web. I think libraries, as public institutions, are in the business
> of preserving information that the public (or maybe better, the "body
> politic") has decided is important. The things which are necessary for
> education, research, public safety, and other concerns. That isn't
> really contradicted by public libraries' fiction sections, because they
> just show that the "body politic" has decided it's important to provide
> entertainment, too. Nor is it contradicted by some libraries being
> privately owned, because even if they're private--unless they're just
> "libraries" in people's homes--they have to reflect "public" concerns to
> some extent. Otherwise no one will use them.
>
> In contrast, the Web is centered on the interests of individuals. It is
> often ... "loose data." It is the realm of freedom and personal
> preference, and somewhat of chaos. Great sites like IMDb or Google exist
> because people want to look for things outside what is provided by the
> public institution of libraries. If you're a film buff like me, you
> won't be satisfied by what libraries can give you. And we wouldn't want
> to make libraries tell us everything about movies. At least not most
> libraries.
>
> This isn't to say you can't publish things, even "serious" things like
> electronic journals, on the Web. Though the "serious" ones are more
> likely to come with a price. Maybe I should say the Web is a realm that
> contains both "raw" and "controlled" data, and librarians select
> strictly from the things they've decided are important.
>
> On the Web, it's questionable that one really has an inalienable right
> to anything. I'm sympathetic to "Net Neutrality," but I wonder if we
> might have to realize that as an entity that exists for individuals'
> whims and interests, the Internet may not be able to provide equal
> access to everybody. That may be another important purpose of libraries,
> to provide a place where individuals who can't afford fast access to it
> at home can get it. But capitalism may hold sway on the Web, as in most
> forms of publishing.
>
> Here's an example of the value of "loose data." I catalog 19th century
> books, and many of them have signatures that are pretty illegible.
> Sometimes I can only guess at how to read people's handwriting. Google
> is a terrific source for deciphering the signatures at times. LC's Name
> Authority File can help somewhat, but it's a lot farther from containing
> every personal name that has ever existed than Google. On Google, I can
> try different possible readings of the names and see which ones have
> matches. After I do that, I may go to the NAF to see if there's a
> corresponding heading.
>
> As a library cataloger, my job is to translate that "loose data" into
> something that isn't "loose." Of course established headings exemplify
> "non-looseness." When something goes from the realm of the private to
> the public, looseness has to stop for the most part. Transcriptional
> fields like the 246 are looser, but even they are governed by some
> strict rules.

--
Corey A Harper
Metadata Services Librarian
Bobst Library
New York University
70 Washington Square South
New York, NY  10012
212.998.2479
corey.harper_at_nyu.edu