Re: Authority in an Age of Open Access (an analysis)

From: Hardy, John <john.hardy_at_nyob> Date: Tue, 6 Nov 2012 16:35:15 +0000 To: NGC4LIB_at_LISTSERV.ND.EDU

As sharper minds than mine have no doubt pointed out on this list,
tagging in a single catalogue is never going to scale: if you assume
250,000 items and ten tags a day, it would take 68 years to tag
everything (disregarding acquisition and disposal and assuming that
everything is tagged just once!)

I'm all in favour of tagging in the catalogue but there has to be a
shared pool of tags

John Hardy

On 6 November 2012 16:22, Jonathan Rochkind <rochkind_at_jhu.edu> wrote:
> A few years ago adding tagging to the catalog was the big thing, several
> people did it. They found that users simply did not add tags in any
> significant number, people seemed uninterested in tagging in the catalog.
>
> I agree it would be awfully useful to have crowd-sourced enhanced access
> points (ie, tagging), but it's a moot point if the crowd isn't going
> participate.
>
> On 11/6/2012 8:17 AM, Joseph Montibello wrote:
>>
>> Hi Jim,
>>
>> Thanks for bringing this talk to the list - I'm listening now and hoping
>> to watch the whole thing later today.
>>
>> I want to push back a little against you on your analysis of the tagging.
>> I don't think that the point of the tagging is to provide consistency or a
>> centralized, authoritative structure for exploring the whole pile.
>>
>> Rather, the point of it is to allow people to produce some metadata to
>> search with / by, for dimensions that wouldn't be covered by traditional
>> cataloging.
>>
>> I don't see this kind of crowdsourcing as a replacement for cataloging in
>> the library world.  Rather, I see it as an extension of (some form of)
>> cataloging into areas where catalogers don't have the resources to go
>> (2500 images in this flickr group alone). In addition, I see it as an
>> extension of cataloging within library data. If my library's catalog
>> doesn't group together all the Bollywood movies that I love, why shouldn't
>> I be able to group them together for myself, and for anyone else who might
>> want to find things grouped that way?
>>
>> And if my list gets only 75% of the Bollywood movies in the collection,
>> isn't that a better result for other users than finding that they have to
>> create a list themselves, each time they want to pull up this group of
>> movies? I think it's better. And if they look at my list and think, "I can
>> improve this," that's where things start to get interesting.
>>
>> This kind of tagging can be a substitute for real cataloging in those
>> areas where catalogers just don't have time to go.  I think it's most
>> useful as an add-on in areas where real cataloging exists, though. Much
>> easier to build a good tag library around "steampunk" if you have name and
>> subject headings that you can use to find things you want to tag.
>>
>> Have a good one,
>> Joe Montibello, MLIS
>> Library Systems Manager
>> Dartmouth College Library
>> 603.646.9394
>> joseph.montibello_at_dartmouth.edu
>>
>>
>>
>>
>>
>>
>>
>>
>> On 11/6/12 6:10 AM, "James Weinheimer" <weinheimer.jim.l_at_GMAIL.COM> wrote:
>>
>>> Apologies for cross-posting, but I thought both lists would be
>>> interested.
>>>
>>> I would like to share a talk by Clay Shirkey, the Internet guru entitled
>>> "Authority in an Age of Open Access"
>>> http://www.cornell.edu/video/?videoID=2396.
>>>
>>> In one part (5 minutes in), he talks about a project of the Smithsonian
>>> Institution, when they put up several thousand images on Flickr and
>>> asked people (anyone) to tag them.
>>> http://www.flickr.com/photos/smithsonian/. He says that this shows what
>>> happens when you "take a job solely for curators and you invite the
>>> public in". He then goes on to mention how there is now a huge,
>>> tremendous list of tags produced by the public and discusses three tags
>>> of interest to him. He obviously considers the huge list of tags as a
>>> positive, but his talk goes in directions different from what I want to
>>> pursue here.
>>>
>>> As a cataloger, I look at it a little differently. The public
>>> undoubtedly did a huge amount of work on these images and all can see
>>> it, but from the viewpoint of access, what is the result? Of course,
>>> there are lots of images and I cannot look at them all, but I chose one
>>> set at random (15 images) "Mary Agnes Chase Field Books"
>>>
>>> http://www.flickr.com/photos/smithsonian/sets/72157629227635110/with/69853
>>> 75963/
>>> and considered the tags that were--and were not--assigned.
>>>
>>> The first thing I discovered was that there is practically no
>>> consistency of the tags within the set. Just looking at the first two
>>> photos illustrates it. The first is a wonderful photo of two little
>>> girls in Brazil labeled "Two of Agnes Chase's favorite subjects."
>>>
>>> http://www.flickr.com/photos/smithsonian/6985375815/in/set-721576292276351
>>> 10/
>>> and there are several tags:
>>> children, girls, two, seated, steps, outdoors, Brazil, 1920s, twenties
>>>
>>> The next photo, just as interesting, is labeled "Serra da Gramma [sic].
>>> Dr. Rolfs, jungly bamboo slope between fazendo and Araponga."
>>>
>>> http://www.flickr.com/photos/smithsonian/6985375845/in/set-721576292276351
>>> 10/
>>> But it lacks any tags at all. I don't know the subject area, but I did
>>> find "Arapongas (Parana?, Brazil)" in the NAF. Yet, if you look in the
>>> comment section, one person "Pixel Wrangler" made some suggestions for
>>> corrections, one of which was actually implemented by the Smithsonian.
>>> At the same time, the Smithsonian staff member (librarian?) was able to
>>> explain a couple of fine points. Which led one person to remark "wow
>>> ....." but I don't know if it was the photo this person found so amazing
>>> or the exchange between "Pixel Wrangler" and "Smithsonian Institution".
>>>
>>> Looking at the rest of the photos as a whole, only the first and last
>>> had geographic location (Brazil), although a total of 9 are in Brazil, 1
>>> Guatemala, 2 Mexico, 1 Nicaragua, 1 Alaska, 1 Arizona.
>>>
>>> 8 out of the 15 (the majority) had no tags at all, other than those the
>>> Smithsonian gave to each one: "Smithsonian Institution Archives,
>>> Smithsonian Institution, Women's History Month". Of those that had tags,
>>> some photos had National park areas added, e.g. "Itatiaia National Park"
>>> which is "Parque Nacional do Itatiaia (Brazil)" in the NAF.
>>>
>>> Some conclusions from this highly cursory analysis: looking at the huge
>>> tag cloud http://www.flickr.com/photos/smithsonian/tags/ should now give
>>> someone pause. We now know that the tags for "Brazil"
>>> http://www.flickr.com/photos/smithsonian/tags/brazil/ are *not* all the
>>> photos of Brazil, even within this small 15 photo collection. We see
>>> only two when there should be at least nine. Who knows how many photos
>>> of Brazil there are within the rest of the collection? If this is so
>>> undeniably true for this single tag, what are you really looking at for
>>> the each of the rest of the tags? The first photo has the tags "girls"
>>> and "children" but this photo has nothing
>>>
>>> http://www.flickr.com/photos/smithsonian/6839255684/in/set-721576292276351
>>> 10/.
>>> When you click on the tag "children" in the huge tag cloud, you will
>>> *not* retrieve this photo. This shows how people assume a lot when they
>>> click on a tag. (Of course, this applies equally to all headings in a
>>> library catalog)
>>>
>>> Or perhaps people don't assume. Or maybe they don't care. Nevertheless,
>>> they should be aware of something that seems so vital, and yet so easily
>>> hidden, as are the 7 photos from this collection when someone clicks on
>>> the Brazil tag. How is somebody supposed to know?
>>>
>>> My experience shows people don't understand any of this and are actually
>>> embarrassed when you demonstrate it to them. They try to explain it away
>>> and then often reply they don't care, but I believe that is a
>>> face-saving maneuver. Are we supposed to believe that they really and
>>> truly don't care what they get from a search?! In my opinion, it is much
>>> more the case that people do not want it to be true and prefer to ignore
>>> it.
>>>
>>> The comments to the photos are indeed very interesting. Some have
>>> substantive information, e.g. in
>>>
>>> http://www.flickr.com/photos/smithsonian/6985376261/in/set-721576292276351
>>> 10/,
>>> there is a discussion about the use of hats in field photographs (led by
>>> the Smithsonian), and in this photo of a steamboat in Alaska,
>>>
>>> http://www.flickr.com/photos/smithsonian/6985376089/in/set-721576292276351
>>> 10/
>>> someone has linked into Wikipedia and Project Gutenberg to give
>>> additional information about this particular steamboat.
>>>
>>> All in all, an impressive project by the Smithsonian, but in my opinion,
>>> not so much for the reasons Clay Shirkey gives. The Smithsonian staff
>>> appear to have taken this as an opportunity for genuine outreach and I
>>> am sure they have created some very good feelings about the Institution.
>>> Kudos to them! It must have been a lot of work but rewarding as well.
>>>
>>> After this short analysis however, the huge tag cloud seems to hide as
>>> much as it reveals. It shows the pitfalls of relying on an enthusiastic
>>> public who are completely untrained and where the idea of providing
>>> "consistent, reliable retrieval" is completely alien. Clay Shirkey
>>> discusses the tags "cyanotype", "moustache" and "steampunk". He is
>>> obviously assuming something when he clicks on one of these tags. What
>>> does he think he is seeing when he clicks on "moustache", I wonder? Does
>>> he realize he is getting only a completely unknown and random
>>> percentage, just as we can demonstrate with "Brazil"? Does he care?
>>>
>>> In spite of all of this, I agree with the overall tenor of his talk, and
>>> found it highly entertaining as well as educational. I suggest it to all.
>>>
>>> --
>>> *James Weinheimer* weinheimer.jim.l_at_gmail.com
>>> *First Thus* http://catalogingmatters.blogspot.com/
>>> *Cooperative Cataloging Rules*
>>> http://sites.google.com/site/opencatalogingrules/
>>> *Cataloging Matters Podcasts*
>>> http://blog.jweinheimer.net/pweb/cataloging-matters-podcasts.html
>>>
>>
>>
>

-- 
John Hardy

Senior Analyst, libraries

Software Services

Capita, Knights Court, Solihull Parkway, Birmingham Business Park B37 7YB

Tel: +44 (0)870 400 5421

Mobile: +44 (0)7977 102347

Skype (Office): jlgh1949
email: john.hardy_at_capita.co.uk
 www.capita.co.uk/software

This email and any attachment to it are confidential.  Unless you are the intended recipient, you may not use, copy or disclose either the message or any information contained in the message. If you are not the intended recipient, you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, unless otherwise stated.  All copyright in any Capita material in this email is reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from the receipt, use or transmission of this email to the fullest extent permitted by law.