Yes, it has been interesting to see how dirty the data is. As you
suggest below, we have been using this as a way to help clean up our
data.
-Tod
On Dec 21, 2007, at Dec 21, 9:16 AM, Ross Singer wrote:
> Right, I get that there will be legitimate variations, but:
> http://tinyurl.com/26xc63
>
> is just a typo (at least by the context it appeared in the word cloud
> which was from a search for 'illinois')
>
> -Ross.
>
> On Dec 21, 2007 10:05 AM, Prestamo, Anne <anne.prestamo_at_okstate.edu>
> wrote:
>> We implemented AquaBrowser last summer. http://boss.library.okstate.edu
>> <http://boss.library.okstate.edu/> We were astounded at the
>> number of
>> "variant spellings" that appeared the Word Cloud, and initially
>> thought
>> that our catalogers would be deluged with requests to make
>> corrections
>> in our records. As we investigated further we realized that a lot of
>> the spelling variants are legitimate spellings that in our case come
>> largely from two sources: 1) the ~90,000 records for Early English
>> Books online; and 2) the SyndeticsICE searchable tables of contents.
>>
>>
>>
>> My favorite example is "McShakespeare", which appears in the Word
>> Cloud
>> if you do a search for "Shakespeare". "McShakespeare" comes from the
>> ToC for Screening Shakespeare in the twenty-first century / edited by
>> Mark Thornton Burnett and Ramona Wray. Here's the ToC.
>> "McShakespeare"
>> appears in the Chapter 9 title.
>>
>>
>>
>> Table of Contents
>>
>> 1 'If I'm right' : Michael Wood's In search of Shakespeare
>>
>> Richard Dutton
>>
>> 13
>>
>> 2 'I see my father' in 'my mind's eye' : surveillance and the filmic
>> Hamlet
>>
>> Mark Thornton Burnett
>>
>> 31
>>
>> 3 Backstage pass(ing) : Stage Beauty, Othello and the make-up of
>> race
>>
>> Richard Burt
>>
>> 53
>>
>> 4 The postnostalgic Renaissance : the 'place' of Liverpool in Don
>> Boyd's My kingdom
>>
>> Courtney Lehmann
>>
>> 72
>>
>> 5 Our Shakespeares : British television and the strains of
>> multiculturalism
>>
>> Susanne GreenhalghRobert Shaughnessy
>>
>> 90
>>
>> 6 Looking for shylock : Stephen Greenblatt, Michael Radford and Al
>> Pacino
>>
>> Samuel Crowl
>>
>> 113
>>
>> 7 Speaking Maori Shakespeare : the Maori Merchant of Venice and the
>> legacy of colonisation
>>
>> Chatherine Silverstone
>>
>> 127
>>
>> 8 'Into a thousand parts divide one man' : dehumanised metafiction
>> and
>> fragmented documentary in Peter Babakitis' Henry V
>>
>> Sarah Hatchuel
>>
>> 146
>>
>> 9 Screening the McShakespeare in post-millennial Shakespeare cinema
>>
>> Carolyn Jess-Cooke
>>
>> 163
>>
>> 10 Shakespeare and the singletons, or, Beatrice meets Bridget
>> Jones :
>> post-feminism, popular culture and 'Shakespea(re)-told'
>>
>> Ramona Wray
>>
>> 185
>>
>> Table of Contents provided by Blackwell Book Services and R R
>> Bowker LLC
>> (c) 2007.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Anne Prestamo
>>
>> Associate Dean for Collection and Technology Services
>>
>> 216 Library
>>
>> Oklahoma State University Library
>>
>> Stillwater, OK 74078-1071
>>
>> Phone: 405-744-9755 FAX: 405-744-7579
>>
>> Email: anne.prestamo_at_okstate.edu
>>
>>
>>
>> -----Original Message-----
>> From: Next generation catalogs for libraries
>> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Ross Singer
>> Sent: Friday, December 21, 2007 8:44 AM
>> To: NGC4LIB_at_listserv.nd.edu
>>
>> Subject: Re: [NGC4LIB] Aqua Browser in beta at U. Chicago
>>
>>
>>
>> While I've never particularly liked the visual display from
>>
>> AquaBrowser (or, similarly, Grokker), one thing I found interesting
>>
>> was, through the spelling variation feature, how many /misspellings/
>>
>> were in the search result data. If UChicago were to then set up a
>>
>> policy to fix typos when they find them, this is actually a really
>>
>> useful feature for maintaining data integrity (and shaming the
>> library
>>
>> by pointing out stupid typos until the data is fixed).
>>
>>
>>
>> -Ross.
>>
>>
>>
>> On Dec 21, 2007 4:55 AM, Stephens, Owen <o.stephens_at_imperial.ac.uk>
>> wrote:
>>
>>> I also have some doubts about the usefulness of the visual browser
>> (aka
>>
>>> word cloud), but I think that probably U. Chicago would acknowledge
>> the
>>
>>> mixed feelings about this, based on the study they did on
>>> AquaBrowser
>>
>>> (http://califa.org/uploadfiles/report_%20final_%202006_10_03.pdf)
>>
>>>
>>
>>> "The word cloud elicited the most mixed responses. Although several
>>
>>> subjects used the
>>
>>> word cloud to eventually identify new materials, few felt they
>>
>>> understood how it worked.
>>
>>> Despite finding it confusing, some subjects found it compelling
>>> enough
>>
>>> to want to
>>
>>> continue to experiment with it. Our study indicates that the
>> suggestion
>>
>>> of related terms to
>>
>>> users can help them find new materials, and that if not all terms in
>> the
>>
>>> word cloud were
>>
>>> relevant, holding the user's interest in these alternate
>>> possibilities
>>
>>> may be important."
>>
>>>
>>
>>> I also think that focussing on the word cloud (although as the
>>> report
>>
>>> says - the word cloud unsuprisingly attracts attention) detracts
>>> from
>>
>>> the rest of the product, which, as far as I can see, is doing much
>>> of
>>
>>> what other products in this space are doing or trying to do:
>>
>>>
>>
>>> Faceted browsing
>>
>>> Relevance ranking
>>
>>> RSS feeds of results
>>
>>>
>>
>>> I'm not clear from Nancy's criticism whether she simply disliked the
>>
>>> word cloud, and this, for her, over-rode any other positives, or
>> whether
>>
>>> she felt that the implementation of these NGC type features was
>>
>>> particularly weak in AquaBrowser compared to other systems out there
>>
>>> (and if so, it would be good to explore which are the strongest
>>
>>> implementations and how do they differ)
>>
>>>
>>
>>> Note that the word cloud can be 'closed' so that the user doesn't
>>> need
>>
>>> to see it, but you then lose all the functionality that has been put
>>
>>> into the word cloud, which includes the spelling alternatives - I
>> think
>>
>>> this is probably a mistake, and it would be nice to have the
>>> spelling
>>
>>> suggestions as text as well as in the visual display (my instinct is
>>
>>> that pulling out the spelling function from the cloud would make the
>>
>>> cloud more useful (less clutter), and the spelling alternatives more
>>
>>> obvious)
>>
>>>
>>
>>> Overall I like the implementation, and I applaud U Chicago both for
>>
>>> trying something different, and doing it relatively well. However, I
>> do
>>
>>> have some (hopefully constructive) criticism.
>>
>>>
>>
>>> Firstly, when I clicked through to the 'more' on the Author
>>> facets, I
>>
>>> found it frustrating that the default sort order was relevance
>>> rather
>>
>>> than alphabetical. I feel that once a user has clicked 'more' here,
>> then
>>
>>> they are likely to be going be looking for someone specific (why
>>> else
>>
>>> click on the author facet?) and so alphabetical listing will make
>>> that
>>
>>> easier to navigate.
>>
>>>
>>
>>> Secondly, when I click through to the Author facets, I still don't
>>> get
>>
>>> the chance to see all the authors connected to my search, so if the
>>
>>> person I'm looking for hasn't written much, I may go away thinking
>> that
>>
>>> the library hasn't got anything by them.
>>
>>>
>>
>>> To take a slightly contrived example:
>>
>>>
>>
>>> I'm looking for books by Alfred Emerson (a Professor of Zoology at U
>>
>>> Chicago) - I rather naively search for 'Emerson'. Unsuprisingly a
>>> lot
>> of
>>
>>> the hits are about/by RW Emerson. The author facet lists 5 authors,
>> none
>>
>>> of whom are Alfred Emerson, but I see that there are '3882 more',
>>> and
>>
>>> click through. I find what seems to be a randomly ordered list of
>>
>>> authors (most of whom are even 'Emerson', nevermind 'Alfred
>>> Emerson' -
>>
>>> it takes me a few moments to realise they are listed by the number
>>> of
>>
>>> items related to them, and slighly longer to find the 'alphabet'
>>> sort
>>
>>> option. After re-sorting, I find that there is still no 'Emerson,
>>
>>> Alfred' listed. I find the 'and more - not shown' note, but there
>>> are
>> no
>>
>>> options to see the 'not shown' hits.
>>
>>>
>>
>>> OK - so if I search for 'alfred emerson' in the first place, I find
>> the
>>
>>> right stuff, and perhaps the example is bogus - but in the end it
>>> bugs
>>
>>> me that I can't see all the authors related to my search results -
>>> why
>>
>>> not, if that's what I want to do?
>>
>>>
>>
>>> Going back to the 'sort alphabetical' vs 'sort relevance' - it would
>> be
>>
>>> nice if it remembered my preference on a facet by facet basis - each
>>
>>> time I go back to the author facet I have to resort alphabetically
>>> (in
>>
>>> the above example, if I narrow my search by LCSH facet of 'Q -
>> Science',
>>
>>> then go back to the 'Author' facet to find Alfred, then he is in
>> there,
>>
>>> but the facet has resorted by relevance, and so he isn't easy to
>>> spot
>> .
>>
>>>
>>
>>> Happy Christmas to all...
>>
>>>
>>
>>> Owen
>>
>>>
>>
>>> Owen Stephens
>>
>>> Assistant Director: e-Strategy and Information Resources
>>
>>> Imperial College London Library
>>
>>> Imperial College London
>>
>>> South Kensington
>>
>>> London SW7 2AZ
>>
>>>
>>
>>>
>>
>>> Tel: 020 7594 8829
>>
>>> Email: o.stephens_at_imperial.ac.uk
>>
>>>
>>
Received on Fri Dec 21 2007 - 11:04:39 EST