indexing Wikipedia - for the offline/CD project

From: Jodi Schneider <jschneider_at_nyob>
Date: Mon, 2 Aug 2010 22:53:29 +0100
To: CODE4LIB_at_LISTSERV.ND.EDU
Does indexing seem oddly satisfying to you? Do you get a niggling
feeling about classifications that aren't quite thesauri, where
transitivity doesn't quite work? [1]

Want to help provide indices for the offline Wikipedia? At Wikimania,
I talked with Martin Walker, who's working on the offline Wikipedia
project. They're looking for (volunteer) help providing improving
their indices. Contact Martin (cc'd), for more details.

-Jodi

[1] http://inkdroid.org/journal/2008/01/23/lcsh-thesauri-and-skos/

=======
We need to be able to take a list like this alphabetical list of
30-50,000 articles:
http://en.wikipedia.org/wiki/Wikipedia:0.7/0.7alpha
(or see a few of our latest selections here:
http://toolserver.org/~enwp10/release-data/)

and generate indexes by topic, and by location:
http://en.wikipedia.org/wiki/Wikipedia:0.7/0.7index
http://en.wikipedia.org/wiki/Wikipedia:0.7/0.7geo

The code we used can be found here:
http://svn.toolserver.org/svnroot/cbm/SelectionBot/index_tools/
It was very powerful, as you can see, but there were definite bugs.

Many thanks,
Martin

-- 
Martin A. Walker Associate Professor of Chemistry
SUNY Potsdam
Potsdam, NY 13676
USA
Tel: +1 (315) 2672271
Fax: +1 (315) 2673170
walkerma_at_potsdam.edu
Received on Mon Aug 02 2010 - 17:53:46 EDT