Re: bibtip (How it works)

From: Kevin M Kidd <kiddk_at_nyob> Date: Fri, 16 May 2008 16:25:00 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

> Like other systems that follow where users go, not whether they liked it 

> there and what they did there, BibTip is susceptible to "ant navigation" 

> problems

In fact, your "ant navigation" analogy is a faulty one in this case. BibTip works astoundingly well, and it is not because it simply follows "where users go".  Instead, BibTip uses "Repeat Buying Theory" as a framework to statistically analyze user search behavior. Repeat Buying Theory is a highly successful and well-tested statistical framework to describe the regularity of repeat-buying behavior of consumers within a distinct period of time. 

The developers of BibTip at Karlsruhe University very skillfully adapted this theory to the session-based search behavior of library OPAC users. They key is that BibTip only records the inspection of the full details of an individual bib record selected from a larger list of search results. It does not "follow" the user. In this framework, clicking-on and reading the full details of a given record is an economic choice. The choice of one record over all of the others in a given list is viewed as an economic choice, very similar to individual's choice to purchase one thing over another during a given trip to the store. There is a real cost in time (e.g. an economic cost) for the user each time he/she selects and views a record. It can be assumed that the "search cost" to a user is high enough that he/she is willing only to view the details of a record which is truly of interest. Users, in effect, are self-selecting. That is, users with common interests will select the same documents, and, since recommendations are only provided to users from the full details view, we can surmise that recommendations are only offered to interested users. 

In order to build relationships among given documents, BibTip analyzes record pairs. For each record X that has been viewed in the full details view of the OPAC, a "purchase history" is built. This is simply a list of all of the sessions in which record X has been viewed. Record X is then compared with all other records (Y) which have been viewed in the same session as X. For each pair of records (X,Y) that have been viewed in the same session, a second purchase history is built. The number of users who have viewed record X and another record Y in the same session is statistically analyzed and the probability of a "co-inspection" of records X and Y in a given session is calculated. A recommendation for record X (That is, users who liked X also liked.) is created when record Y has been viewed more often in the same session that can be expected from random selections. 

This "repeat buying theory" is remarkably good at automatically determining relevant recommendations for a given item. It takes some time for enough data to be collected so that good recommendations are available for a substantial part of a collection, but what is the hurry? Of course, the longer you have the algorithm running, the better your recommendations become. The more users you have, the better your recommendations become. But, time is on our side in this case ;-)

BibTip is a signal example of harnessing collective intelligence to serve the needs of the library.

Frustratingly, for all the talk here and elsewhere of the features of next generation catalogs, I rarely find anything that convinces me that librarians understand that collecting/harvesting and re-using user (and usage) data is the key to most (if not all) of the services we want these new catalogs to provide. Without seriously thinking about the implications of harnessing collective intelligence - and taking steps *now* to build systems that do - we are not going to get very far. BibTip as a service is a big step in the right direction.

---------------------------------------------

Kevin M. Kidd, MA, MLIS

Library Applications & Systems Manager

Boston College Libraries

Phone: 617-552-1359

Fax: 617-552-1089

e-Mail: kevin.kidd_at_bc.edu

Blog: http://datadrivenlibrary.blogspot.com/

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Tim Spalding
Sent: Thursday, May 15, 2008 5:46 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] bibtip

Oh, this is red meat. I'd talk about recommendation systems till you

all throw me off.

1. You can't draw conclusions based upon a small number of overlapping

"trips." If one trip were enough and you knew I'd looked at something

super-obscure, you could probably figure out the other pages I'd

looked at too. Just go to the page you saw me browsing and see what

appears in the "people who looked at this also looked at..." box. If

my obscure book of Hellenistic poetry overlaps with "Having an Affair

for Dummies," I'm in trouble with the missus.

2. But if you need multiple overlaps, the amount of usable data goes

way down. This is, I submit, what Ann Arbor's recommendation system

showed. You need a lot of data in a recommendation system for it to

work. The worse the data, the more you need. (On LibraryThing, we do

not generally even *try* to make a recommendation when there are fewer

than 15 copies of a book in the system, and those aren't books you

casually looked at, those are books in people's personal collection.)

3. Like other systems that follow where users go, not whether they

liked it there and what they did there, BibTip is susceptible to "ant

navigation" problems. You know how ants find their way about? They

follow the trail put down by other ants. This works well in general,

but it can also go bad. An ant gets lost. Another ant happens on the

trail, and gets lost too, a third and sees a really strong trail, so

three are lost, etc. At its worst you have the famous phenomenon of

ants going round and round in a circle, following other ants and their

ever-stronger trail, until all the ants die of exhaustion!

I ask you: Do we want library patrons dying of exhaustion?

4. In all seriousness, the ant problem is real. Every time the catalog

sends you somewhere you don't want to go, you've made a trail telling

the next guy to go there too. If library catalogs worked, ant-tracking

would too. But when I type "Harry Potter" into the search box of a

large public library I use all the time, I don't get a real-live

English-language Harry Potter book until item number nine!

5. At one point I looked into the "people who've looked at this also

looked at that" algorithm, and I thought that Amazon claimed a patent

on it. All I can find now is a really general patent on recommendation

systems. If others know the situation, I'd love to hear of it.

Tim

On Thu, May 15, 2008 at 4:42 PM, Eric Lease Morgan <emorgan_at_nd.edu> wrote:

> The May/June 2008 issue of D-Lib Magazine includes an article called

> "Adding Value to the Library Catalog by Implementing a Recommendation

> System" that may be of interest people on this mailing list. [1]

> 

> Specifically, the article describes an "implicit" recommender

> services called BibTip. The application sits between a user and the

> "catalog" while collecting what gets used by whom and when. Based on

> this sort of information, the application makes suggestions for other

> items in the "catalog".

> 

> One thing I found particularly interesting was that this application

> is not necessarily "catalog" specific in that is can be envoked

> through a Javascript call in head element of HTML. Again, it is not

> about creating a specific application that people come to. Instead,

> it is about creating applications that can be integrated and

> syndicated to other venues.

> 

> [1] http://www.dlib.org/dlib/may08/monnich/05monnich.html

> 

> --

> Eric Lease Morgan

> Head, Digital Access and Information Architecture Department

> Hesburgh Libraries, University of Notre Dame

> 

> (574) 631-8604

> 

--

Check out my library at http://www.librarything.com/profile/timspalding