Abstract: Elasticsearch (ES) is a distributed RESTful search engine optimized for large-scale and long-text search scenarios. Recent research on Text-to-Query has explored using large language models (LLMs) to convert user query intent into executable code, making it an increasingly popular research topic. We are the first to introduce the novel semantic parsing task text-to-ES, aiming to bridge the gap between LLMs and ES by leveraging LLMs to generate Domain-Specific Language (DSL) and corresponding post-processing codes to support multi-index ES queries. We propose the text-to-ES benchmark, which consists of two datasets: the Large Elasticsearch Dataset (LED), containing 26,207 text-ES pairs derived from a 224.9GB schema-free database, and the ElasticSearch (BirdES) dataset with 10,926 pairs sourced from the Bird dataset on a 33.4GB schema-fixed database. Our trained model outperformed DeepSeek-R1 by 15.63% on the LED dataset, setting a new state-of-the-art, and achieved 78% of DeepSeek-R1's performance on the BirdES dataset. Additionally, we provide in-depth experimental analyses and suggest future research directions for this task, with plans to release our code and datasets in the future.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking;NLP datasets;
Contribution Types: Data resources
Languages Studied: English
Submission Number: 1284
Loading