Thanks for sharing this - very interesting. Does this suggest that Lucene could be on the way to being the 'default' indexing engine (similar to Apache being the 'default' web server), with products differentiating themselves by the way they build functionality on top of this (different displays, APIs, Workflow support etc.)?
The ToC navigator is interesting, but also shows (for me) a problem with the 'Clouds' as navigation. As you drill down, you get to the point where the Author Cloud is filled with equally weighted terms (e.g. http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagCloud=true&collection=jos&tagField=dauthor&journal=%22Biochemistry%20and%20Cell%20Biology%22&syear=2005&eyear=2005&numCloudDocs=600&numCloudTags=70) - presumably each author has written a single article. Where you have unequally weighted terms, the terms obviously differentiate themselves within the Cloud by size. However, where they all have the same weight, the impression is of a mass of text, in which individual terms are hard to differentiate.
Best
Owen
Owen Stephens
Assistant Director: e-Strategy and Information Resources
Imperial College London Library
Imperial College London
South Kensington
London SW7 2AZ
Tel: 020 7594 8829
Email: o.stephens_at_imperial.ac.uk
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Glen Newton -
> NRC/CNRC CISTI/ICIST Research
> Sent: 17 October 2007 17:09
> To: NGC4LIB_at_listserv.nd.edu
> Subject: Re: [NGC4LIB] Ungava project: Search, browse,
> visualize catalog and article-level collections
>
> Eric,
>
> Thank-you for your very kind words! And yes, Open Source software is
> giving us some very excellent, rich and stable choices these days.
>
> APIs: The plan is to get OpenSearch working on top of this index, to
> make it usable for others...
>
> I thought I should point out one thing that might not be obvious to
> someone casually taking a look at Ungava: in the NRC Research Press
> article collection, there is a (spartan) table of contents navigator
> implemented[0]. At each level (title[1], volume[2]) the user
> can generate the keyword or author cloud of all articles contained by
> that level in the hierarchy. So the user can get a bird's eye view of
> the journal as defined by its keywords or authors, across the entire
> journal[3] [4] or for a particular volume[5][6].
>
> I didn't necessarily have a plan to build many of these interesting
> and useful tools, but since it was so relatively easy and pain-free to
> build the underlying indexing infrastructure, it allows me to focus
> on creative ways of using the content and the indexing.
>
> Thanks,
>
> Glen
>
> [0]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Browse
> [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Browse?calyHan
> dler=showVolumes&journal=1253
> [2]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Browse?calyHan
> dler=showIssues&collection=jos&journal=1253&volume=23381
> [3]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagClou
> d=true&collection=jos&tagField=keyword&journal=%22Canadian%20J
> ournal%20of%20Botany%22&numCloudDocs=718&numCloudTags=100
> [4]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagClou
> d=true&collection=jos&tagField=dauthor&journal=%22Canadian%20J
> ournal%20of%20Botany%22&numCloudDocs=800&numCloudTags=70
> [5]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagClou
> d=true&collection=jos&tagField=keyword&syear=2004&eyear=2004&j
> ournal=%22Canadian%20Journal%20of%20Botany%22&numCloudDocs=173
> &numCloudTags=100
> [6]http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagClou
> d=true&collection=jos&tagField=dauthor&syear=2004&eyear=2004&j
> ournal=%22Canadian%20Journal%20of%20Botany%22&numCloudDocs=173
> &numCloudTags=60
>
> --
> Glen Newton | glen.newton_at_nrc-cnrc.gc.ca
> Researcher, Information Science, CISTI Research
> & NRC W3C Advisory Committee Representative
> http://tinyurl.com/yvchmu
> tel/tél: 613-990-9163 | facsimile/télécopieur 613-952-8246
> Canada Institute for Scientific and Technical Information (CISTI)
> National Research Council Canada (NRC)| M-55, 1200 Montreal Road
> http://www.nrc-cnrc.gc.ca/
> Institut canadien de l'information scientifique et technique (ICIST)
> Conseil national de recherches Canada | M-55, 1200 chemin Montréal
> Ottawa, Ontario K1A 0R6
> Government of Canada | Gouvernement du Canada
> --
>
> > From: Eric Lease Morgan <emorgan_at_ND.EDU>
> > Subject: Re: Ungava project: Search, browse, visualize
> catalog and article-level collections
> >
> > On Oct 16, 2007, at 9:44 AM, Glen Newton - NRC/CNRC CISTI/ICIST
> > Research wrote:
> >
> > > Ungava [1] is a Lucene-based test-bed for experimental, scalable
> > > search, browse and results visualization of library catalogs and
> > > article-level collections. The first release also includes an
> > > implementation of drill clouds [2] for search refinement, as well
> > > as using Simile's [3] Timeline for temporal visualization of
> > > search results and Exhibit for faceted interactions.
> > >
> > > Forthcoming collection instances:
> > >
> > > - DOAJ: ~2900 journals, ~900 full-text, ~160k article
> > > - BioMed Central: ~180 journals
> > > - arxiv.org: 443,988 e-prints, metadata & full-text
> > > - University of Denver Catalogue
> > > - CERN Document Server: ~900k articles, ~360k fulltext
> > >
> > > [1] http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/
> > > Ungava
> > > [2] http://zzzoot.blogspot.com/2007/10/drill-clouds-for-search-
> > > refinement-id.html
> > > [3] http://simile.mit.edu/
> >
> >
> > Thank you for bringing this the list's attention, and there are to
> > things about the work I would like to highlight for subscribers.
> >
> > First, the system seems to be built using existing tools "freely"
> > available on the Web. An individual was able to identify a technical
> > problem needing to be solved and then constructed a possible
> > solution. Open source software combined with the availability of
> > globally networked computers make such a process feasible. Such a
> > combination fosters innovation.
> >
> > Second, and just as important in my mind, is the content of the
> > "catalog". It does (or will) include books as well as journal
> > articles. In the long run, I sincerely believe users will find such
> > an index to be more useful since it aggregates content into a single
> > place. Fewer bibliographic silos.
> >
> > Kudos, and good luck.
> >
> > P.S. Consider implementing an API against the index. For example,
> > consider the use of OpenSearch [1] and/or SRU [2]. Given an API: 1)
> > others could re-use your index in other venues, and 2) you would not
> > be married to any particular indexer/search engine.
> >
> > [1] http://www.opensearch.org/
> > [2] http://www.loc.gov/standards/sru/
> >
> > --
> > Eric Lease Morgan
> > University Libraries of Notre Dame
>
Received on Thu Oct 18 2007 - 04:52:15 EDT