Re: Pinyin (Romanized Chinese) searching

From: Ya'aqov Ziso <ziso_at_nyob> Date: Thu, 29 Oct 2009 12:21:59 -0400 To: CODE4LIB_at_LISTSERV.ND.EDU

Till, Bess, Ralph,

>> assuming the algorithm for enriching the spellings of a word (from PinYin to
Chinese) exists, will the result include both forms, PinYin AND Non-PinYin
Chinese transliteration and BOTH forms will be indexed?
>> the principle of indexing different forms for spelling a vertain word exists
in name authority records, where a name (for ex. Pushkin) has over 30 forms of
different spellings. The string of different names can be expended (Ralph¹s work
with VIAF and with fuzzy logic in WorldCat/identities is definitely relevant
here).
maybe something is already underway (?)
>> how large will the resulting index be? managable for medium-small
installations of vuFIND?

Ya¹aqov Ziso, Electronic Resource Management Librarian, Rowan University 856
256 4804 

On 10/29/09 11:34 AM, "Till Kinstler" <kinstler_at_GBV.DE> wrote:

> Bess Sadler schrieb:
> 
>> > So, thoughts? Anyone know more about this than I do and want to speak up?
> 
> I'd second Demian's and Jonathan's statements: Do that in Solr by using
> a Filter (either at indexing or search time).
> You want to solve that using an algorithm that translates american
> transcription into chinese, correct? If you have that algorithm (is
> there one?), it's a perfect job for a filter and I guess there are use
> cases outside libraryland as well. It's not only us dealing with
> transcription of chinese...
> If I misunderstood your approach and you want to use a dictionary to map
> the different transcriptions, solr.SynonymFilterFactory could provide a
> solution. 
> (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilte
> rFactory) 
> 
> 
> Till