Re: After MARC...MODS?

From: James Weinheimer <j.weinheimer_at_nyob> Date: Mon, 26 Apr 2010 06:01:18 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

On Thu, 22 Apr 2010 21:53:44 +1000, Alexander Johannesen
<alexander.johannesen_at_GMAIL.COM> wrote:

>On Thu, Apr 22, 2010 at 21:18, Weinheimer Jim <j.weinheimer_at_aur.edu> wrote:
>> How about this? Pretend that you are interested in the history
>> of black people in agriculture in the United States. Tell me how
>> you would go about searching and retrieving information in
>> Google or Google Scholar or a related tool using full text.
>
>Well, I pop "black people in agriculture in the United States" and get
>back 1.3 million hits of which, one can assume, there is valuable
>information. I go through them, and copy and paste into my research
>document anything that smacks of gold.
>
>And this satisfy your use case 100%; this is someone with an interest.

Sorry I didn't respond to this earlier since I only saw it now. Your results
and reactions are extremely interesting. 

First:
You say that you get back "...1.3 million hits of which, one can assume,
there is valuable information. I go through them, and copy and paste into my
research document anything that smacks of gold." 

What??? You go through 1.3 million hits???? You are a really fast reader!
Sorry, but that one I will never believe. And this is one of the main
problems when my students find when they use a tool such as Google in
practice: the result is completely out of control. When they have gotten the
same types of results for simple purposes relating to their own general
interest, they don't care so much about the search result since it can be
fun to "surf." But when they are grappling with something important such as
writing a paper, where they have to stick to a specific topic, and they
could flunk out if they quote something stupid, they see Google as something
much less useful and more similar to a toy. And it frightens them.

While they feel there may be something there in the search result of 1.3
million (and I add: or not there, see below), but the results are in a
completely unpredictable order that change constantly, based on the number
of links to an item (and thus, place #1 is determined primarily by bloggers)
plus there are a number of other factors that determine ranking which are
business secrets of Google. It has been shown without a doubt that this
order can be manipulated for all sorts of purposes (for obvious examples,
see Google-Bombing in Wikipedia, but this is being done constantly in far
more subtle ways). As I tell my students, the Google use of the term
"relevance" does *not at all* equal their own understanding of the term
"relevance" and they should not confuse the two. The Google use is a
secretive business term but one chosen strategically to make their customers
more comfortable. It works.

Second:
What exactly are you looking at when you see the results from "black people
in agriculture in the United States" and also, what are you not looking at?
Well, you miss many original documents, because the term "blacks" was not
the word used for African-American people in agriculture in the early United
States. There were other terms used, some highly insulting today. When a
cataloger puts in metadata, it's a completely different matter. In a library
catalog, you don't have to search these older terms, but in full-text you
do, or they will never come up in the result--and you will never realize it.
As a result, you miss entire categories of really useful information. 

Other problems: "agriculture" is unnecessarily limiting. You would also have
to search at least "farming" but probably others as well. Searching "United
States" will miss most of the information in the individual states, where
there will be lots of possibly the most interesting resources. 

I won't discuss "quality of information" here, which is another huge problem
that people have to face every day. You say, "anything that smacks of gold"
but how am I supposed to know that? Also, I won't discuss exactly what
Google is and is not searching when you do a search, because this is another
of their closely-head secrets.

So we see that what at first glance appears to be extremely simple: typing a
few words into a box and getting a result, is incredibly complex and
terribly limiting. It takes an expert to understand how limiting it is.
Google has done an excellent job of making it seem to be simple, and they
have done this by designing a tool to *make people happy*, but we should not
confuse this with providing results that are reliable and comprehensible,
which is what people really want. And it has serious consequences, as
students will tell you.

I would suggest that when people see matters in this way, they will see the
immensity of such a task, and that they will have a bit more respect for the
work done in catalogs, which smooths the way for people. But of course, "we
won't get no respect!"

Perhaps this is too detailed for your purposes, but it certainly is not too
detailed for the students I work with, who are being serious about it and,
as I say, terribly worried about it since not dealing with it could derail
their entire careers.

Library catalogs are designed on different principles and have strengths in
exactly these areas, and this is why I think that creating a tool that would
bring the strengths of library catalogs together with full-text retrieval
tools would be the best. But simply ignoring what our tools can do would be
the same as allowing superstition and bias and even censorship to run rampant.

James Weinheimer  j.weinheimer_at_aur.edu
Director of Library and Information Services
The American University of Rome
via Pietro Roselli, 4
00153 Rome, Italy