Re: Whose elephant is it, anyway? (the OLE project)

From: Stephens, Owen <o.stephens_at_nyob> Date: Fri, 13 Mar 2009 10:52:40 +0000 To: NGC4LIB_at_LISTSERV.ND.EDU

Thanks Mark

From your comments it sounds like the browse functions are ones you've added on to the basic VuFind functionality? Can you confirm? If so, are there any plans of integrating this functionality back into VuFind core?

Thanks - and congratulations on the implementation - it looks great :)

Owen

Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ

t: +44 (0)20 7594 8829
e: o.stephens_at_imperial.ac.uk

> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Mark Triggs
> Sent: 13 March 2009 09:44
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] Whose elephant is it, anyway? (the OLE project)
> 
> [Apologies if this comes through twice.  I sent this about 14 hours ago
> and
> haven't seen it arrive yet, so...]
> 
> Hi all,
> 
> I've been watching this discussion with some interest because I'm the
> guy who implemented the browse functionality in the NLA's catalogue.  I
> just thought I'd jump in and confirm/deny a few things here.
> 
> Our current title and uniform title browses were among the first
> browses
> we attempted to implement, so they're currently a bit of a legacy
> feature.  We implemented these using a combination of Solr range
> queries
> and sorting and they mostly sort of work, but perhaps not quite as
> smoothly as the other browses (as evidenced by the 'internal server
> error' that Owen managed to produce ;o).  I'm on holidays at the
> moment,
> but this will probably be revisited when I'm back at work.
> 
> Our other browses (names, subjects, callnumbers and series) make use of
> a combination of SQLite databases and Lucene indexes.  Each browse
> consists of an SQLite database with a single table of two columns: a
> sort key and the text of the browse heading.  When we receive a request
> to browse from a certain point we can get back the pageful of headings
> to display by using a simple SQL SELECT statement.  For each heading
> listed we determine the number of titles matched and any
> cross-references by performing Lucene term queries (fast) on indexes of
> our bib data and authority data respectively.  All of this is handled
> by
> a Solr browse handler I've written, so all our VuFind code needs to
> know
> is to hit the browse handler and style the XML it gets back.
> 
> Regarding scalability, our largest browse is the callnumber browse,
> which consists of about 3 million entries (for 4 million bib records).
> I've tested this SQLite approach up to 20 million entries and it
> continued to perform well, so I'm not terribly worried for now.
> Finding
> the point to browse from is effectively just searching a big sorted
> text
> file, so I would expect O(log N) growth here anyway.  Plus, our largest
> SQLite database still fits entirely in memory, so that's nice too.
> 
> In terms of indexing performance, the SQLite databases take about 5-10
> minutes to build in total, and they're built from scratch every time.
> For each type of browse we pull all the browse headings from our bib
> and
> authority data, remove any duplicates then load them all into an SQLite
> database.  My browse handler notices when these databases have been
> updated and automatically reopens them, so the update is transparent.
> Currently we just do these updates once per night, as this is how often
> we update our main bib indexes and it makes sense to keep the updates
> synchronised, but I don't see any problem with doing this more often if
> it made sense.
> 
> I'm happy to answer any questions about our implementation either on or
> off list.
> 
> Cheers,
> 
> Mark
> 
> 
> "Stephens, Owen" <o.stephens_at_IMPERIAL.AC.UK> writes:
> 
> > Bernhard,
> >
> > Just to understand what you are looking for in terms of Browse. The
> > NLA implementation of VuFind has what I would regard as a Browse
> > function - you can Browse the following:
> >
> > Names at http://catalogue.nla.gov.au/Browse/Names?browse=names&from=
> > Subjects at
> > http://catalogue.nla.gov.au/Browse/Subjects?browse=subjects&from=
> > Callnumbers at
> > http://catalogue.nla.gov.au/Browse/Subjects?browse=subjects&from=
> > Series at
> > http://catalogue.nla.gov.au/Browse/Series?browse=series&from=
> >
> > All these options are available in the user interface at
> > http://catalogue.nla.gov.au/Browse/Home ('Browse' is an option in the
> > horizontal menu under the main 'catalogue' banner)
> >
> > This page also offers Title and Uniform Title browsing, but these
> seem
> > not to work in the same way at the moment (I've sent feedback about
> > this)
> >
> > Is this browsing as you mean it? If not, what would you require
> > additionally?
> >
> > (also you question the scalability - what scale are you thinking of?
> > I'd guess that NLA is reasonably large - but I can't easily find a
> > figure for the number of bib records - but obviously it may not be as
> > large as other national libraries or consortium collections)