Authority in an Age of Open Access (an analysis)

From: James Weinheimer <weinheimer.jim.l_at_nyob> Date: Tue, 6 Nov 2012 12:10:25 +0100 To: NGC4LIB_at_LISTSERV.ND.EDU

Apologies for cross-posting, but I thought both lists would be interested.

I would like to share a talk by Clay Shirkey, the Internet guru entitled
"Authority in an Age of Open Access"
http://www.cornell.edu/video/?videoID=2396.

In one part (5 minutes in), he talks about a project of the Smithsonian
Institution, when they put up several thousand images on Flickr and
asked people (anyone) to tag them.
http://www.flickr.com/photos/smithsonian/. He says that this shows what
happens when you "take a job solely for curators and you invite the
public in". He then goes on to mention how there is now a huge,
tremendous list of tags produced by the public and discusses three tags
of interest to him. He obviously considers the huge list of tags as a
positive, but his talk goes in directions different from what I want to
pursue here.

As a cataloger, I look at it a little differently. The public
undoubtedly did a huge amount of work on these images and all can see
it, but from the viewpoint of access, what is the result? Of course,
there are lots of images and I cannot look at them all, but I chose one
set at random (15 images) "Mary Agnes Chase Field Books"
http://www.flickr.com/photos/smithsonian/sets/72157629227635110/with/6985375963/
and considered the tags that were--and were not--assigned.

The first thing I discovered was that there is practically no
consistency of the tags within the set. Just looking at the first two
photos illustrates it. The first is a wonderful photo of two little
girls in Brazil labeled "Two of Agnes Chase's favorite subjects."
http://www.flickr.com/photos/smithsonian/6985375815/in/set-72157629227635110/
and there are several tags:
children, girls, two, seated, steps, outdoors, Brazil, 1920s, twenties

The next photo, just as interesting, is labeled "Serra da Gramma [sic].
Dr. Rolfs, jungly bamboo slope between fazendo and Araponga."
http://www.flickr.com/photos/smithsonian/6985375845/in/set-72157629227635110/
But it lacks any tags at all. I don't know the subject area, but I did
find "Arapongas (Parana?, Brazil)" in the NAF. Yet, if you look in the
comment section, one person "Pixel Wrangler" made some suggestions for
corrections, one of which was actually implemented by the Smithsonian.
At the same time, the Smithsonian staff member (librarian?) was able to
explain a couple of fine points. Which led one person to remark "wow
....." but I don't know if it was the photo this person found so amazing
or the exchange between "Pixel Wrangler" and "Smithsonian Institution".

Looking at the rest of the photos as a whole, only the first and last
had geographic location (Brazil), although a total of 9 are in Brazil, 1
Guatemala, 2 Mexico, 1 Nicaragua, 1 Alaska, 1 Arizona.

8 out of the 15 (the majority) had no tags at all, other than those the
Smithsonian gave to each one: "Smithsonian Institution Archives,
Smithsonian Institution, Women's History Month". Of those that had tags,
some photos had National park areas added, e.g. "Itatiaia National Park"
which is "Parque Nacional do Itatiaia (Brazil)" in the NAF.

Some conclusions from this highly cursory analysis: looking at the huge
tag cloud http://www.flickr.com/photos/smithsonian/tags/ should now give
someone pause. We now know that the tags for "Brazil"
http://www.flickr.com/photos/smithsonian/tags/brazil/ are *not* all the
photos of Brazil, even within this small 15 photo collection. We see
only two when there should be at least nine. Who knows how many photos
of Brazil there are within the rest of the collection? If this is so
undeniably true for this single tag, what are you really looking at for
the each of the rest of the tags? The first photo has the tags "girls"
and "children" but this photo has nothing
http://www.flickr.com/photos/smithsonian/6839255684/in/set-72157629227635110/.
When you click on the tag "children" in the huge tag cloud, you will
*not* retrieve this photo. This shows how people assume a lot when they
click on a tag. (Of course, this applies equally to all headings in a
library catalog)

Or perhaps people don't assume. Or maybe they don't care. Nevertheless,
they should be aware of something that seems so vital, and yet so easily
hidden, as are the 7 photos from this collection when someone clicks on
the Brazil tag. How is somebody supposed to know?

My experience shows people don't understand any of this and are actually
embarrassed when you demonstrate it to them. They try to explain it away
and then often reply they don't care, but I believe that is a
face-saving maneuver. Are we supposed to believe that they really and
truly don't care what they get from a search?! In my opinion, it is much
more the case that people do not want it to be true and prefer to ignore
it.

The comments to the photos are indeed very interesting. Some have
substantive information, e.g. in
http://www.flickr.com/photos/smithsonian/6985376261/in/set-72157629227635110/,
there is a discussion about the use of hats in field photographs (led by
the Smithsonian), and in this photo of a steamboat in Alaska,
http://www.flickr.com/photos/smithsonian/6985376089/in/set-72157629227635110/
someone has linked into Wikipedia and Project Gutenberg to give
additional information about this particular steamboat.

All in all, an impressive project by the Smithsonian, but in my opinion,
not so much for the reasons Clay Shirkey gives. The Smithsonian staff
appear to have taken this as an opportunity for genuine outreach and I
am sure they have created some very good feelings about the Institution.
Kudos to them! It must have been a lot of work but rewarding as well.

After this short analysis however, the huge tag cloud seems to hide as
much as it reveals. It shows the pitfalls of relying on an enthusiastic
public who are completely untrained and where the idea of providing
"consistent, reliable retrieval" is completely alien. Clay Shirkey
discusses the tags "cyanotype", "moustache" and "steampunk". He is
obviously assuming something when he clicks on one of these tags. What
does he think he is seeing when he clicks on "moustache", I wonder? Does
he realize he is getting only a completely unknown and random
percentage, just as we can demonstrate with "Brazil"? Does he care?

In spite of all of this, I agree with the overall tenor of his talk, and
found it highly entertaining as well as educational. I suggest it to all.

-- 
*James Weinheimer* weinheimer.jim.l_at_gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/pweb/cataloging-matters-podcasts.html