Re: Using controlled vocabularies to enhance search/browse

From: Leslie Johnston <johnston_at_nyob> Date: Fri, 9 Jun 2006 13:23:04 -0400 To: NGC4LIB_at_listserv.nd.edu

The issue that we're dealing with as a stumbling block to developed a
faceted browse interface for our digital library repository is that
it's an aggregation of formats and collections with multiple
controlled vocabularies -- faculty developed image sets with their
own ontologies (or none), images with AAT terms, EAD, TEI texts with
LCSH, and fiction TEI texts that mostly do not have LCSH because
practice has not required that they be assigned subject
headings.  The metadata for these materials were also created using
varying tools -- bibliographic tools for MARC, XML tools for EAD, a
visual resources database for the Library's images, and wildly
varying tools for faculty projects.

How much normalization can we afford to do in terms of staff
time?  Subject terms?  Names?  Locations?  Genre?  And one of the
most challenging, Dates?  (Although the CDL date normalization tool
is a huge step forward in that area)  If people think creating
metadata is expensive, try normalizing it.  "Standards" can be highly
variable.  The logic for capturing just the expression of
responsibility from a TEI header was astonishing to me when I first
saw it because of the many possibilities for the encoding.   Sure,
some of it can be done programmatically, but not all, and first you
have to design the logic and write the scripts ...

To index our materials together we crosswalk to a local metadata
element standard, programmatically generate processed metadata files,
and index those.  At least we've got consistent fielding, even if we
don't yet have consistent vocabularies.

We'll likely be starting on another prototype metasearch
implementation this summer (our first prototype did not go into
production), and I'm looking forward to it.

Leslie

At 07:51 PM 6/8/2006, Riley, Jenn wrote:
>(Changing the subject line as this is veering off into new territory...)
>
>Hollly, you've really hit on something here. While incorporating the
>syndetic structure of controlled vocabularies into search and browse
>isn't without its issues, it's strongly supported by a significant
>amount of user research. While we aren't using anything like this in the
>vended ILS at Indiana University, we are increasingly doing it in our
>home-grown digital library systems.
>
>For a good example, see our Charles W. Cushman Photograph Collection at
><http://www.dlib.indiana.edu/collections/cushman/>. Any search that
>matches a lead-in term (see reference) from the LC Thesaurus of
>Graphical Materials I: Subject Terms (TGM I) immediately maps that
>lead-in term to the authorized term and shows the user search results
>right away. There's an indication on the results page that the lead-in
>term maps to something else, but the user sees results immediately just
>as if she's typed in the authorized term. No telling the user she's
>using the "wrong" words in her search. Any search or browse that matches
>a subject heading (either directly or through a lead-in term) retrieves
>results using that term or any term narrower than it in the hierarchy.
>For example, a search/browse on "sports" would retrieve images cataloged
>with "sports" but also images cataloged with "basketball," "baseball,"
>and "curling" if those terms were also used in the collection. On the
>results screen, for each subject term matched, the next broader and
>narrower terms for in the thesaurus are listed, allowing the user to
>expand or refine her search based on what she sees in the initial result
>set.
>
>To see some of this in action, search for "boats" from the collection
>search page.
>
>This process was developed through our dissatisfaction with literal,
>string-based subject searching, studying the literature on controlled
>vocabulary usage, a series of user studies, and support from a number of
>people in the IU Digital Library Program for a vision of how we might
>better meet our users' needs. This type of functionality comes from a
>perspective that says the system should get people where they want to go
>no matter *what* they type in - that users should focus on *using* the
>materials they find, rather than expending that effort learning the
>"right" way to discover those same materials. Our users can do better
>work with library materials if they don't have to spend so much of their
>time looking for them.
>
>We're incorporating these ideas into our larger digital library
>infrastructure as we speak. A brief explanation of how this is all
>accomplished in the Cushman collection can be found at
><http://webapp1.dlib.indiana.edu/cushman/projectInfo/techImplementation.
>jsp>. A larger, more in-depth explanation including the user studies we
>performed that convinced us this strategy was the right way to go can be
>found in Dalmau, Michelle, Randall Floyd, Dazhi Jiao, and Jenn Riley.
>"Integrating Thesaurus Relationships into Search and Browse in an Online
>Photograph Collection." Library Hi Tech 23, no. 3 (2005): 425-452;
><http://www.emeraldinsight.com/10.1108/07378830510621829>. (That link
>should provide free access to the paper, at least for the time being.
>Please let me know if that's not the case!)
>
>Jenn
>
>========================
>Jenn Riley
>Metadata Librarian
>Digital Library Program
>Indiana University - Bloomington
>Wells Library E170
>(812) 856-5759
>www.dlib.indiana.edu
>
>Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
>
>
>
>________________________________
>
>         From: Next generation catalogs for libraries
>[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Holly Ledvina
>         Sent: Thursday, June 08, 2006 10:15 AM
>         To: NGC4LIB_at_LISTSERV.ND.EDU
>         Subject: Re: [NGC4LIB] who is the primary user?
>
>
>         I agree that we need to examine how patrons use the catalog to
>determine if they are finding vs searching. I have been using the search
>transaction logs in our system to determine which subject  searches
>retrieve no hits. After examining the results the patron sees - i.e.
>where does the "no results" search take them in the index, I add the
>term used as a 4xx see reference in our authority files. The 4xx see
>reference will then take the patron to a catalog message that the term
>is not used in this catalog but to search using the "subject heading"
>listed which links directly to the subject term.
>
>         While this referral directs the patron to the "term used" it is
>nonetheless an intermediary step and click. What I would really like is
>a natural language interpretation within the catalog software that
>authomatically directs the patron to the correct term of the controlled
>vocabulary, seamlessly. Maintain the controlled vocabulary but make it
>invisible to patron. And yes, there are a gazillion problems with this
>thought but its patron friendly and merits exploration. It may even be
>working somewhere in a library - anyone?
>
>         Holly Ledvina
>
>
>
>
>
>         K.G. Schneider wrote:
>
>                 I'm less interested in defining who users are than
>examining what they do
>                 and working backwards from that premise.
>
>                 I have a hypothesis: search logs (not transaction logs;
>but special logs
>                 that generate information about search behavior) for a
>wide variety of
>                 libraries would yield highly similar data on the types
>of queries performed
>                 by users-right down to top queries, lowest queries, top
>successes, top no
>                 hits, and patterns such as number of terms and
>complexity of queries.
>
>                 I have a bet: most libraries don't generate search logs
>or any similar
>                 search analytics for their user behavior. Much, much
>discussion; little,
>                 little data.
>
>                 I have an observation: companies such as Google aren't
>spending a lot of
>                 time worrying about their various "communities."  That's
>not to say that
>                 it's necessarily bad to do so... but as an initial
>preoccupation, we may be
>                 barking up the wrong tree.
>
>                 Why don't we start from the user data and work
>backwards? Re search logs,
>                 I'll show you mine if you show me yours...and we aren't
>even an OPAC (though
>                 due to our name a lot of users think we are, as our logs
>show-something we'd
>                 like to address by building better no-results pages).
>
>                 Karen G. Schneider
>                 kgs_at_bluehighways.com
>
>
>
>         --
>         Holly Ledvina
>         Catalog Librarian
>         Outagamie Waupaca Library System
>         225 N. Oneida Street, Appleton, WI 54911
>         hledvina_at_mail.owls.lib.wi.us
>         920-832-6386
>
>         "If we are to have an educated and informed population we need a
>strong and open library system supported by a committed administration.
>We cannot call for a revival of quality education in America and close
>our libraries.  We cannot ask our children to learn to read and take
>away their books."  Jimmy Carter.

------------
Leslie Johnston
Head, Digital Access Services
University of Virginia Library
http://lib.virginia.edu/digital/
http://lib.virginia.edu/digital/das/
johnston_at_virginia.edu