I am using a DictionaryCompoundWordTokenFilterFactory. I have a dutch compound_words_dict_nl.txt which contains the following words:
pen
slot
knop
I use this dictionary to divide the words penslot and knoppen.
The problem is, I don't want the knoppen word considered as a compound word. It is a plural from knop. The filter will devide it to knop and pen, which will return results with knop and pen in them. But the knoppen word shouldn't generate a pen word. Just a stemmed version - knop (that I cover with a stemming filter in the analyzer).
If I take away the pen word from the dictionary, it will add only the word slot as a token, which I don't want for a penslot case.
Is there an easy workaround for this type of issue or do I need to create some custom Filter?
I am using a DictionaryCompoundWordTokenFilterFactory. I have a dutch compound_words_dict_nl.txt which contains the following words:
pen
slot
knop
I use this dictionary to divide the words penslot and knoppen.
The problem is, I don't want the knoppen word considered as a compound word. It is a plural from knop. The filter will devide it to knop and pen, which will return results with knop and pen in them. But the knoppen word shouldn't generate a pen word. Just a stemmed version - knop (that I cover with a stemming filter in the analyzer).
If I take away the pen word from the dictionary, it will add only the word slot as a token, which I don't want for a penslot case.
Is there an easy workaround for this type of issue or do I need to create some custom Filter?
0 commentaires:
Enregistrer un commentaire