Re: An Endeca-based Union Catalog

From: Erik Hatcher <esh6h_at_nyob> Date: Wed, 17 Jan 2007 11:21:49 -0500 To: NGC4LIB_at_listserv.nd.edu

Charley,

Thanks for your reply.  Good stuff!   My bad for saying "bad data".
What I really mean is that faceted browsing really works best with a
small set of values, and it's "bad", in my opinion, to have so many
one-off headings.  My effort with Solr Flare and the code4lib pre-
conference preparation is right in line with the same process you
described, let the catalogers see all the data in a way they've never
seen before and have them clean-up and refine.

Here I thought I was working on a revolutionary findability
application, and turns out its a decent data cleaner too :)

        Erik

On Jan 17, 2007, at 10:38 AM, Charley Pennell wrote:

> Erik-
>
>  The decision to limit the number of facets to the top 30 was not due
> to bad data, but rather to the desire to keep the interface simple
> enough to not be intimidating to users, and to not slow down
> retrieval/display of results.  Before we went live to the public, we
> allowed all facets to be displayed on our test implementation.
> This, I
> might add, was very handy for catalogers, who were able to see and fix
> the myriad problems associated with incorrect subfield codes and field
> tags.  In particular, with form/genre headings that showed up as
> topical, but also with geographic subheadings showing up as topical or
> chronological, and other problems.  Response time on the development
> server was not as good as it could have been had we limited the
> expansion of the facets, even under a low usage load, but more
> importantly, facet lists scrolled on forever.  This should not be too
> surprising, given the literally tens and even hundreds of thousands of
> unique authors, topical, genre, and geographic headings in our
> catalogs.  There were discussions about the sort order that should be
> presented for these facets as well.  Obviously, sorting of faceted
> terms
> by frequency is useful for indicating something about the relevance of
> certain terms or names to your search, but beyond a certain point this
> is not very useful when you are trying to research into the "long
> tail".  Getting into these more unique, less-posted, facets now is
> really only possible when one increases precision by adding additional
> terms or filters to one's search, the behavior we are rewarding
> through
> limiting the display as we have.  I assume that FCLA is having similar
> discussions amongst its stakeholders about the trade-offs they will
> need
> to make to optimize Endeca for their users.
>
>    Charley
>
> Erik Hatcher wrote:
>> Are there facet values not being shown?  From what I hear of NCSU's
>> system, only the top 30 or so facets are shown, and the others are so
>> much lower in frequency as to be considered bad data.  I'm curious if
>> this is a similar configuration in your implementation?
>
> --
> __________________________________ __________________________________
> """""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""""
> Charley Pennell                        mailto:cpennell_at_unity.ncsu.edu
> Principal Cataloger for Metadata                 voice: (919)515-2743
> Metadata and Cataloging Department                 fax: (919)515-7292
> NCSU Libraries, Box 7111
> North Carolina State University
> Raleigh, NC  27695-7111
>
>      Adjunct Librarian, Memorial University of Newfoundland
> World Wide Web:     http://www.ibiblio.org/hillwilliam/chuckhome.html
> __________________________________ __________________________________
> """""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""""