Re: next "next-generation library catalog"

From: Oliver Flimm <flimm_at_nyob> Date: Thu, 1 Jul 2010 10:08:13 +0200 To: NGC4LIB_at_LISTSERV.ND.EDU

Hi,

On Wed, Jun 30, 2010 at 05:02:35PM +0100, David Pattern wrote:
> We've had recommendations ("people who borrowed this...") based on circ data in our OPAC since 2005 and it's had a huge (positive) impact on how stock circulates and on how many items students borrower per year, e.g.
> http://library.hud.ac.uk/catlink/bib/415607/cls/

when we started experimenting with the implementation of our own
recommendation system back in 2006 we evaluated both circulation data
of our library system and usage data of our KUG OPAC. We initially
went the circulation way to find out that:

a) we couldn't get enough data. A dataset for one borrowed book every
4 weeks (until its returned) results in fewer items to analyse and
thus worse statistics - although we have quite a lot of loans
(including ILL!!!) with a number of 1.015.450 circulations in 2009.
The number of items borrowed at a time range from 60.000 to 150.000 at
our library. 

b) our library system is not very willing to give us information about
circulation events, so we had to export a daily snapshot of all items
borrowed, and then figure out the actual circulation activity - not
what I would call compelling...

c) we didn't want to struggle with german data secrecy obligations
concerning userdata, that are rather strict

Instead we then tried to analyse KUG usage. We suspected that every
time a user selects a specific title from a search result list, it
might be of interest to him and so we accumulated all those titles for
a specific anonymous session. Then we compared differend titles of
different sessions. Pretty thin... we thought initially,
until we analysed all those packets of titles per session.

The result was very good, although we had - like Amazon - 'false'
titles too. Interestingly enough these were quite often the result of
tutorials we offer to our users with always the same titles from
different subject areas ;-) 

All of this usage data (and much more) is collected for the last 2-3
years in a separate statistics-database of our KUG system. Right now
we have registered around 5.600.000 clicks for a full title display.

BTW, does anybody know of an ontology to describe the events of those
clicks per session for the raw data we collected or recomendations at
all? So we could - like our bibliographic records - also release our
raw or processed recommendation data as Open Data ;-)

To be of any use for others it would also be essential to stick an
identifier to every media item, like ISBN. This would reduce the
number of recomendations but would make it much more usable elewhere.

Just my 0.02 EUR ;-)

Regards,

O. Flimm

-- 
Universitaet zu Koeln :: Universitaets- und Stadtbibliothek
IT-Dienste :: Abteilung Universitaetsgesamtkatalog
Universitaetsstr. 33 :: D-50931 Koeln
Tel.: +49 221 470-3330 :: Fax: +49 221 470-5166
flimm_at_ub.uni-koeln.de :: www.ub.uni-koeln.de