Charley,
Thanks for your reply. Good stuff! My bad for saying "bad data".
What I really mean is that faceted browsing really works best with a
small set of values, and it's "bad", in my opinion, to have so many
one-off headings. My effort with Solr Flare and the code4lib pre-
conference preparation is right in line with the same process you
described, let the catalogers see all the data in a way they've never
seen before and have them clean-up and refine.
Here I thought I was working on a revolutionary findability
application, and turns out its a decent data cleaner too :)
Erik
On Jan 17, 2007, at 10:38 AM, Charley Pennell wrote:
> Erik-
>
> The decision to limit the number of facets to the top 30 was not due
> to bad data, but rather to the desire to keep the interface simple
> enough to not be intimidating to users, and to not slow down
> retrieval/display of results. Before we went live to the public, we
> allowed all facets to be displayed on our test implementation.
> This, I
> might add, was very handy for catalogers, who were able to see and fix
> the myriad problems associated with incorrect subfield codes and field
> tags. In particular, with form/genre headings that showed up as
> topical, but also with geographic subheadings showing up as topical or
> chronological, and other problems. Response time on the development
> server was not as good as it could have been had we limited the
> expansion of the facets, even under a low usage load, but more
> importantly, facet lists scrolled on forever. This should not be too
> surprising, given the literally tens and even hundreds of thousands of
> unique authors, topical, genre, and geographic headings in our
> catalogs. There were discussions about the sort order that should be
> presented for these facets as well. Obviously, sorting of faceted
> terms
> by frequency is useful for indicating something about the relevance of
> certain terms or names to your search, but beyond a certain point this
> is not very useful when you are trying to research into the "long
> tail". Getting into these more unique, less-posted, facets now is
> really only possible when one increases precision by adding additional
> terms or filters to one's search, the behavior we are rewarding
> through
> limiting the display as we have. I assume that FCLA is having similar
> discussions amongst its stakeholders about the trade-offs they will
> need
> to make to optimize Endeca for their users.
>
> Charley
>
> Erik Hatcher wrote:
>> Are there facet values not being shown? From what I hear of NCSU's
>> system, only the top 30 or so facets are shown, and the others are so
>> much lower in frequency as to be considered bad data. I'm curious if
>> this is a similar configuration in your implementation?
>
> --
> __________________________________ __________________________________
> """""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""""
> Charley Pennell mailto:cpennell_at_unity.ncsu.edu
> Principal Cataloger for Metadata voice: (919)515-2743
> Metadata and Cataloging Department fax: (919)515-7292
> NCSU Libraries, Box 7111
> North Carolina State University
> Raleigh, NC 27695-7111
>
> Adjunct Librarian, Memorial University of Newfoundland
> World Wide Web: http://www.ibiblio.org/hillwilliam/chuckhome.html
> __________________________________ __________________________________
> """""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""""
Received on Wed Jan 17 2007 - 10:46:00 EST