lundi 21 avril 2014

Pouvons-nous le filtre du mot composé ne divise ne pas certains mots dans solr ? -Débordement de pile


I am using a DictionaryCompoundWordTokenFilterFactory. I have a dutch compound_words_dict_nl.txt which contains the following words:
pen
slot
knop


I use this dictionary to divide the words penslot and knoppen.


The problem is, I don't want the knoppen word considered as a compound word. It is a plural from knop. The filter will devide it to knop and pen, which will return results with knop and pen in them. But the knoppen word shouldn't generate a pen word. Just a stemmed version - knop (that I cover with a stemming filter in the analyzer).


If I take away the pen word from the dictionary, it will add only the word slot as a token, which I don't want for a penslot case.


Is there an easy workaround for this type of issue or do I need to create some custom Filter?



I am using a DictionaryCompoundWordTokenFilterFactory. I have a dutch compound_words_dict_nl.txt which contains the following words:
pen
slot
knop


I use this dictionary to divide the words penslot and knoppen.


The problem is, I don't want the knoppen word considered as a compound word. It is a plural from knop. The filter will devide it to knop and pen, which will return results with knop and pen in them. But the knoppen word shouldn't generate a pen word. Just a stemmed version - knop (that I cover with a stemming filter in the analyzer).


If I take away the pen word from the dictionary, it will add only the word slot as a token, which I don't want for a penslot case.


Is there an easy workaround for this type of issue or do I need to create some custom Filter?


0 commentaires:

Enregistrer un commentaire