[AML] 23. Enhancing Text-to-SQL with Open-source LLMs

[AML] 23. Enhancing Text-to-SQL with Open-source LLMs

THU 2024 Winter AML Submission27 Authors

11 Dec 2024 (modified: 02 Mar 2025)THU 2024 Winter AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model; Text-to-SQL

TL;DR: A novel and high-performance text-to-SQL method which is based on open-source LLMs

Abstract: The text-to-SQL task seeks to bridge natural language questions and database query systems by converting user questions into executable SQL queries. While recent advancements in large language models (LLMs) have significantly improved the task’s accuracy, current methods often rely on proprietary LLMs with high costs, limited accessibility, and data privacy concerns. This paper presents SageSQL, a novel multi-agent framework leveraging open-source LLMs to address these challenges. SageSQL introduces a robust schema linking process, enabling accurate identification of relevant database components, followed by a diverse SQL generation module to maximize structural variety in generated queries. A self-consistency-based post-processing mechanism further refines the final SQL output. Experimental results on the Spider and BIRD benchmarks demonstrate that SageSQL outperforms state-of-the-art methods based on open-source LLMs and achieves competitive performance with proprietary LLMs, highlighting its potential as a cost-effective, privacy-preserving solution for complex text-to-SQL tasks.

Submission Number: 27

Loading