Abstract: Elasticsearch is one of the most popular full-text search and analytics engine. It can store, search, and analyze big volumes of data in near real time. Before searching, Elasticsearch will build multiple inverted indexes for the data. The analyzer plays a crucial role in this process. An appropriate analyzer can segment text into semantically meaningful words, which can significantly improve the query accuracy. However, the default analyzer has limited performance in the Chinese context. Existing methods generally replace the default analyzer with manual configuration to optimize the query effect. The cost of service downtime caused by manually updating analyzers is often unacceptable in production environments. Based on Elasticsearch’s Restful-API, we have implemented a framework for dynamic configuration of analyzers in a cluster environment. The framework supports common Chinese analyzers and provides a visual interface. Experiments show that the framework proposed in this paper reduces the update and maintenance time cost of the analyzer in the online environment by 94% compared to manual update. At the same time, compared with the default analyzer configuration of Elasticsearch, the accuracy of the system based on this framework is improved by 30%.
0 Replies
Loading