Sorry, this posting is a technical one. But the answers to Bernhards
questions may be of interest for others, too...
Bernhard Eversberg schrieb:
> How it is affected by the physical growth of the data. Does it get
> slower with every million data, and how much?
Depends on your hardware. I just expanded a Solr index from 5 million to
about 20 million records. Its size is now about 80 GB (there is a lot of
redundancy) and at the moment it lives on a USB disk. Searching slowed
down a bit. With 5 million records, with 50 simultaneous users sending
searches we had average response times of about 100 to 200 ms, now we
have 250 to 300 ms. My guess is, that the USB interface to the disk is
the limitting factor, we will investigate that.
> How long is it to create the index?
We are indexing about 200 to 500 (bibliographic) records per second.
Indexing speed in Solr depends on the amount of text you put into it and
what processing you do during indexing.
When I add fulltext article data (each file about 10 kB of text)
indexing rate drops to about 50 to 80 records per second, because there
is much more text to process than in a bibliographic record.
> Is real-time updating possible?
You can update records or add records anytime, it doesn't hurt. But they
are only findable after sending a "commit" command to Solr. Such a
commit may (depending on index size, Solr configuration and hardware)
take up to some/many seconds (during the commit the index still is
searchable, so it is not a "system blackout"). So it is not real
real-time updating, because in a library environment you don't want to
issue a commit after every single record update. But sending a commit
every 10 minutes or so, would be a good strategy.
> How many hours per million records for a complete
> re-index?
For indexing rates see above. There is no significant difference whether
it's re-indexing, updating or new indexing...
> Does this time grow linearly or exponentially?
About linearly.
Regards and sorry for this rather technical post,
Till
--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kinstler@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de
Received on Fri Mar 13 2009 - 04:29:13 EDT