Re: [Fwd: NGC4LIB Digest - 30 Mar 2008 to 31 Mar 2008 (#2008-67)]

From: Emily Lynema <emily_lynema_at_nyob> Date: Wed, 2 Apr 2008 11:43:18 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Sorry, that was an unintentional send!  How embarrassing.

In the context of this discussion, just wanted to note that we've
experimented with the idea of using the uncontrolled text in
bibliographic records (like title and table of contents) to point users
to possibly relevant subject headings that have been assigned to the
records retrieved for their natural language search. It works well for
some types of searches, but not as well for others.

The 'revolutionary war' search problem is a good example (neither of
those terms are used in the relevant subject heading[s]). In our
pre-alpha system, the first suggested heading for this search is "United
States--History--Revolution, 1775-1783--Pictorial works." and the fourth
is the broader "United States--History--Revolution, 1775-1783." Our
collection, with its US bias, is going to steer what headings are
suggested to the user, although the second heading suggested is
"France--History--Revolution, 1789-1799." and the list also contains the
generic heading "Revolutions."

The one problem I'm not sure about with generic dictionary tools is
whether we will end up suggesting terms and topics to our users that
don't exist within our local collections. Is that a bad thing? I can see
the advantages of using the data within your own collection for
suggesting more appropriate terminology in that it will lead users to
fewer dead ends.

-emily lynema
NCSU Libraries

Emily Lynema wrote:
> I've done a bit of experimentation with this taking another approach of
> suggesting the most popular subject headings that are found on records
> in the originally retrieved set. It actually works quite well with
> 'revolutionary war' in our collection (an excellent example we've used
> often, since users need to find out that the heading is actually United
> States--History--
>
> -------- Original Message --------
> Subject: NGC4LIB Digest - 30 Mar 2008 to 31 Mar 2008 (#2008-67)
> Date: Mon, 31 Mar 2008 23:02:57 -0400
> From: Automatic digest processor <LISTSERV_at_LISTSERV.ND.EDU>
> Reply-To: Next generation catalogs for libraries <NGC4LIB_at_LISTSERV.ND.EDU>
> To: Recipients of NGC4LIB digests <NGC4LIB_at_LISTSERV.ND.EDU>
>
> There are 8 messages totalling 459 lines in this issue.
>
> Topics of the day:
>
>   1. word tools (8)
>
> ----------------------------------------------------------------------
>
> Date:    Mon, 31 Mar 2008 07:27:06 -0400
> From:    Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: word tools
>
> If any number of word tools (dictionaries, thesauri, gazetteers,
> authority lists, encyclopedias, etc.) were at your disposal, then how
> would you employ them in the implementation of a "next generation"
> library catalog?
>
> As you know, the data for many of these word tools are freely
> available on the 'Net. True, much of the content is dated, but that
> does not make it 100% useless, just less useful as it could be.
> Moreover, some of this data is formatted in such a way that it can be
> retrieved programatically. One example is through the DICT protocol. [1]
>
> If you had programatic access to this word tool data, then how would
> you incorporate it into your library "catalog"? If I had such data I
> would use the dictionary function to confirm what I was looking for
> was what I had searched. I would use a thesaurus to suggest other
> search terms. I would use an authority list to provide See Also and
> See From references. I would use an encyclopedia to get an overview
> of the topic and then move on to the cited books and articles.
>
> For a good time, I toyed with this idea ever so briefly. I first
> installed a DICT server, downloaded subject authorities from FRED
> [2], and created a simple "dictionary" whose words were authority
> terms and "definitions" where the See From and See Also references.
> You folks who use Linux may be able to try this:
>
>    dict -h 208.81.177.118 -d subjects -s substring blues
>
> Which returns something like this:
>
>    From Subject authority list [subjects]:
>
>    Blues festivals
>          See from: Blues music festivals
>          See also: Music festivals
>
>    Blues (Fictitious character)
>          See from: Blues le chat (Fictitious character)
>
>    Blues (Music)
>          See from: Blues (Music)--United States
>          See from: Blues (Songs, etc.)
>          See from: Jive (Music)
>          See also: African Americans--Music
>          See also: Folk music--United States
>          See also: Popular music
>          See also: Rhythm and blues music
>          See also: Washboard band music
>
> This looks to me like an additional Did You Mean implementation. Food
> for thought on a Monday.
>
> [1] http://www.dict.org/
> [2] http://www.ibiblio.org/fred2.0/authorities/
>
> --
> Eric Lease Morgan
> University Libraries of Notre Dame
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 10:50:56 -0400
> From:    Jonathan Rochkind <rochkind_at_JHU.EDU>
> Subject: Re: word tools
>
> I too have been thinking about how to incorporate "see also" type
> references from our rich bibliographic records into our search
> functions. In addition to "did you mean", it's possible that in
> some/many cases, the search should be _automatically_ expanded. For
> instance, from Eric's examples, in some cases when the user enters
> "Jive" in a keyword search, should the search be automatically expanded
> to include "OR subject: Blues (Music)"?  I think maybe so. Somehow the
> user should probably be notified somewhere on screen that this happened,
> however.  And have the option to _disable_ it. A facetted interface
> probably helps here--or maybe vice versa, it's me assuming a facetted
> interface that led me to think about this.
>
> More food for thought.
>
> These kind of "used for" references (a non-preferred 'lead in' term that
> points to a preferred index term) occur not only from LCSH authority
> records, and in some cases from personal/corporate name authority
> records, but perhaps also in other places in our records. For instance,
> most of our systems incorporate serial preceeding/succeeding titles _as_
> legitimate titles in the (browse) title index for the record that does
> NOT have those titles. This often is confusing to the user. But there
> might be some way to treat those preceeding/succeeding titles as
> non-preferred lead-ins in a keyword search that would be less confusing,
> especially if you could turn it off.
>
> Jonathan
>
> Eric Lease Morgan wrote:
>> If any number of word tools (dictionaries, thesauri, gazetteers,
>> authority lists, encyclopedias, etc.) were at your disposal, then how
>> would you employ them in the implementation of a "next generation"
>> library catalog?
>>
>> As you know, the data for many of these word tools are freely
>> available on the 'Net. True, much of the content is dated, but that
>> does not make it 100% useless, just less useful as it could be.
>> Moreover, some of this data is formatted in such a way that it can be
>> retrieved programatically. One example is through the DICT protocol. [1]
>>
>> If you had programatic access to this word tool data, then how would
>> you incorporate it into your library "catalog"? If I had such data I
>> would use the dictionary function to confirm what I was looking for
>> was what I had searched. I would use a thesaurus to suggest other
>> search terms. I would use an authority list to provide See Also and
>> See From references. I would use an encyclopedia to get an overview
>> of the topic and then move on to the cited books and articles.
>>
>> For a good time, I toyed with this idea ever so briefly. I first
>> installed a DICT server, downloaded subject authorities from FRED
>> [2], and created a simple "dictionary" whose words were authority
>> terms and "definitions" where the See From and See Also references.
>> You folks who use Linux may be able to try this:
>>
>>   dict -h 208.81.177.118 -d subjects -s substring blues
>>
>> Which returns something like this:
>>
>>   From Subject authority list [subjects]:
>>
>>   Blues festivals
>>         See from: Blues music festivals
>>         See also: Music festivals
>>
>>   Blues (Fictitious character)
>>         See from: Blues le chat (Fictitious character)
>>
>>   Blues (Music)
>>         See from: Blues (Music)--United States
>>         See from: Blues (Songs, etc.)
>>         See from: Jive (Music)
>>         See also: African Americans--Music
>>         See also: Folk music--United States
>>         See also: Popular music
>>         See also: Rhythm and blues music
>>         See also: Washboard band music
>>
>> This looks to me like an additional Did You Mean implementation. Food
>> for thought on a Monday.
>>
>> [1] http://www.dict.org/
>> [2] http://www.ibiblio.org/fred2.0/authorities/
>>
>> --
>> Eric Lease Morgan
>> University Libraries of Notre Dame
>>
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 09:42:57 -0700
> From:    Kyle Banerjee <kyle.banerjee_at_GMAIL.COM>
> Subject: Re: word tools
>
>> I too have been thinking about how to incorporate "see also" type
>>  references from our rich bibliographic records into our search
>>  functions. In addition to "did you mean", it's possible that in
>>  some/many cases, the search should be _automatically_ expanded. For
>>  instance, from Eric's examples, in some cases when the user enters
>>  "Jive" in a keyword search, should the search be automatically expanded
>>  to include "OR subject: Blues (Music)"?  I think maybe so. Somehow the
>>  user should probably be notified somewhere on screen that this happened,
>>  however.  And have the option to _disable_ it. A facetted interface
>>  probably helps here--or maybe vice versa, it's me assuming a facetted
>>  interface that led me to think about this.
>
> Automatic expansion is a dangerous except when the number of
> retrievals is relatively small. Seems like a better way to go would be
> is to give what was requested, but use something along the lines of
> wikipedia's disambiguation pages to help direct the user to other
> contexts. I agree in principle that a good system should automatically
> adjust the search in intelligent ways.
>
> Power users like things like facets, controls, and whatno. From what I
> can tell, most people just want to type a search and get results
> without messing with anything. I like faceting, but statistics I've
> seen on facet use indicate that the vast majority of people either
> don't notice them, don't know what they do, or don't care about them.
>
> kyle
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 13:13:28 -0400
> From:    Jonathan Rochkind <rochkind_at_JHU.EDU>
> Subject: Re: word tools
>
> I agree that many features like facetting are only going to be used by
> 'power users'. But it's still important to provide them.
>
> I was thinking along those lines when I said that there should maybe be
> some automatic query expansion, but with a clear message to the user and
> way to eliminate it when desired.  That method to eliminate it would
> also probably only  be used by 'power users' (they're the only ones who
> would notice the message too!).
>
> I'm still thinking some kind of automatic query expansion might be
> needed for the non-power users. If only offered as an option (not
> automatic), I think most non-power-users won't take it!  And the nature
> of our authority files says to me that sometimes it's needed.
>
> Shouldn't someone searching for Alexandre Borodine have their query
> automatically expanded to the authorized heading "Borodin, Aleksandr
> Porfirevich, 1833-1887".  LC authority # n80128710. (Note the two
> different romanized spellings of his name there, the first query is NOT
> going to get items entered under the authorized heading without expansion).
>
> Tchaikovsky is another good example, of course.  I think that some kind
> of automatic query expansion is the way to actually make use of the
> power of our authority files.  I personally find it helpful to think of
> the authorized name as a kind of identifier. So automatic query
> expansion here is really matching their query to an identifier for an
> entity, and then using that identifier in the query to collocate all
> items having that identifier that matched their query.
>
> Of course, there are various tricks with this, it's not necessarily
> trivial and obvious to make this work right without getting in the way.
> But I think it's an area which should be explored.
>
> Jonathan
>
> Kyle Banerjee wrote:
>>> I too have been thinking about how to incorporate "see also" type
>>>  references from our rich bibliographic records into our search
>>>  functions. In addition to "did you mean", it's possible that in
>>>  some/many cases, the search should be _automatically_ expanded. For
>>>  instance, from Eric's examples, in some cases when the user enters
>>>  "Jive" in a keyword search, should the search be automatically expanded
>>>  to include "OR subject: Blues (Music)"?  I think maybe so. Somehow the
>>>  user should probably be notified somewhere on screen that this
>>> happened,
>>>  however.  And have the option to _disable_ it. A facetted interface
>>>  probably helps here--or maybe vice versa, it's me assuming a facetted
>>>  interface that led me to think about this.
>>>
>>
>> Automatic expansion is a dangerous except when the number of
>> retrievals is relatively small. Seems like a better way to go would be
>> is to give what was requested, but use something along the lines of
>> wikipedia's disambiguation pages to help direct the user to other
>> contexts. I agree in principle that a good system should automatically
>> adjust the search in intelligent ways.
>>
>> Power users like things like facets, controls, and whatno. From what I
>> can tell, most people just want to type a search and get results
>> without messing with anything. I like faceting, but statistics I've
>> seen on facet use indicate that the vast majority of people either
>> don't notice them, don't know what they do, or don't care about them.
>>
>> kyle
>>
>>
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 10:45:14 -0700
> From:    Karen Coyle <kcoyle_at_KCOYLE.NET>
> Subject: Re: word tools
>
> Jonathan Rochkind wrote:
>> I too have been thinking about how to incorporate "see also" type
>> references from our rich bibliographic records into our search
>> functions. In addition to "did you mean", it's possible that in
>> some/many cases, the search should be _automatically_ expanded. For
>> instance, from Eric's examples, in some cases when the user enters
>> "Jive" in a keyword search, should the search be automatically expanded
>> to include "OR subject: Blues (Music)"?
>
> This reminds me of the first time that we tried automatic integration of
> LC authority records with the university's bib records. Somehow,
> everyone named "Mia" also got indexed with the terms "Manila
> International Airport." In your case, a linguistics student would get a
> large number of unrelated music records.
>
> The problem with words is that they need context in order to have
> precise meaning. Facets can help this, but as Kyle mentions, most people
> don't use them. I think the key issue here is making it easy for the
> user to provide that little bit of context that is needed. That's
> definitely where we've failed so far. Unfortunately, I don't have a
> viable solution.
>
> kc
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 13:56:44 -0400
> From:    Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: word tools
>
> On Mar 31, 2008, at 1:13 PM, Jonathan Rochkind wrote:
>
>> ...I was thinking along those lines when I said that there should
>> maybe be
>> some automatic query expansion, but with a clear message to the
>> user and
>> way to eliminate it when desired....
>
>
> So, if some word tools were available, then some of us would use them
> to do query expansion, but can we think of other possible uses? For
> example, how about an Instant Pathfinder? Enter a word for phrase and
> get back an outline of facts/knowledge:
>
>    You searched for "blues". Assuming you meant music, then:
>
>      * Definitions of blues include... [use dictionary]
>      * Ideas similar to blues are... [use thesauri]
>      * Here is a short [encyclopedia] article on blues...
>      * Here is a list of books, articles, and music samples... [use
> catalogs and indexes]
>      * Here are local people who are experts in blues... [use directory]
>      * The librarian to chat with for more information is... [use
> directory]
>
> Same or similar things could be done for just about any subject.
>
> --
> Eric Lease Morgan
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 13:58:30 -0400
> From:    "Riley, Jenn" <jenlrile_at_INDIANA.EDU>
> Subject: Re: word tools
>
>> Automatic expansion is a dangerous except when the number of
>> retrievals is relatively small. Seems like a better way to go
>> would be is to give what was requested, but use something
>> along the lines of wikipedia's disambiguation pages to help
>> direct the user to other contexts. I agree in principle that
>> a good system should automatically adjust the search in
>> intelligent ways.
>
> The research I know of in this area suggests automatic expansion is
> effective for synonyms and narrower terms, and that providing users with
> a way to expand on request broader and related terms. I get these two
> articles confused sometimes, but at least one of them talks about this
> issue at length, and I think they're a good place to start:
>
> Greenberg, J. (2001a), "Automatic query expansion via lexical-semantic
> relationships", Journal of
> the American Society for Information Science and Technology, Vol. 52 No.
> 6, pp. 402-15.
>
> Greenberg, J. (2001b), "Optimal query expansion (QE) processing methods
> with semantically
> encoded structured thesauri terminology", Journal of the American
> Society for Information
> Science and Technology, Vol. 52 No. 6, pp. 487-98.
>
> I agree the landscape gets much more complex when you have the variety
> of meanings Eric used in his examples - the "did you mean" approach for
> multiple senses of a query makes sense to me.
>
> Jenn
>
> ========================
> Jenn Riley
> Metadata Librarian
> Digital Library Program
> Indiana University - Bloomington
> Wells Library W501
> (812) 856-5759
> www.dlib.indiana.edu
>
> Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
>
> ------------------------------
>
> Date:    Mon, 31 Mar 2008 14:24:04 -0400
> From:    Eric Lease Morgan <emorgan_at_ND.EDU>
> Subject: Re: word tools
>
> A while ago the following message was forwarded to our list:
>
>> Subject: [NGC4LIB] LCSH Browser: LCC added
>> Date: Wed, 14 Nov 2007 15:46:06 +0100
>> From: Bernhard Eversberg <ev_at_BIBLIO.TU-BS.DE>
>> Reply-To: Next generation catalogs for libraries
>> <NGC4LIB_at_listserv.nd.edu>
>> To: NGC4LIB_at_listserv.nd.edu
>>
>> Nathan Rinne informed this forum, last week, about our experiment
>> in LCSH browsing:
>>
>>   http://www.biblio.tu-bs.de/db/lcsh/
>>
>> Now, since the topical term LCSH authority records often contain
>> LC Class numbers as well, we've just added another index and
>> arranged it in ascending LCC order, displaying the topical term
>> next to the number so as to give an idea what the number means.
>
>
>
> It points to another example of how a word tool could be used as a
> part of a "next generation" library catalog. In the example LCSH is
> used as the thesaurus, but any number of thesauri could be used as
> well. Not everything is described using LCSH, and the "catalog" may
> very well contain things that are not books. Think journal articles,
> encyclopedia articles, pictures, data sets, even definitions, etc.
>
> Something like WordNet could form the technical foundation for such
> an word tool implementation. [1] WordNet's infrastructure (read, API)
> could even be used to allow people to add "tags" to supplement the
> thesaurus. Hmmm...
>
> [1] http://wordnet.princeton.edu/
>
> P.S. "Thanks Mary L. for bringing this to my attention!"
>
> --
> Eric Lease Morgan
>
> ------------------------------
>
> End of NGC4LIB Digest - 30 Mar 2008 to 31 Mar 2008 (#2008-67)
> *************************************************************
>

--
Emily Lynema
Systems Librarian for Digital Projects
Information Technology, NCSU Libraries
919-513-8031
emily_lynema_at_ncsu.edu