Query expansion via synonyms is a very powerful yet easy to configure feature. However, the current limitation of 500 entries across all expanded_terms arrays is too low for most use cases. For enterprise corpora covering a multitude of topics, we estimate that a limit in the range of 10k to 30k synonyms would be required per collection in order to apply the available thesauri. For scientific applications (think about diseases, drugs, proteins) there might even be a higher limit required. If there is a workaround via enrichment, of course that would also be a solution.
Why is it useful?
|Who would benefit from this IDEA?||As a user of a search engine I want to obtain search results with high recall without having to think about all possible synonyms so that I do not miss any relevant documents or passages.|
How should it work?