Keywords: Large Language Model; Text-to-SQL
TL;DR: A novel and high-performance text-to-SQL method which is based on open-source LLMs
Abstract: The text-to-SQL task seeks to bridge natural language questions and database query systems by converting user questions into executable SQL queries. While recent advancements in large language models (LLMs) have significantly improved the task’s accuracy, current methods often rely on proprietary LLMs with high costs, limited accessibility, and data privacy concerns. This paper presents SageSQL, a novel multi-agent framework leveraging open-source LLMs to address these challenges. SageSQL introduces a robust schema linking process, enabling accurate identification of relevant database components, followed by a diverse SQL generation module to maximize structural variety in generated queries. A self-consistency-based post-processing mechanism further refines the final SQL output. Experimental results on the Spider and BIRD benchmarks demonstrate that SageSQL outperforms state-of-the-art methods based on open-source LLMs and achieves competitive performance with proprietary LLMs, highlighting its
potential as a cost-effective, privacy-preserving solution for complex text-to-SQL tasks.
Submission Number: 27
Loading