Abstract: To address the challenges of improving the performance of large language models in Text-to-SQL tasks, we propose DFM-SQL, a framework that integrates multiple innovative strategies to significantly enhance the generation and selection of candidate SQL statements. Specifically, we developed a multiple LLMs generator system to produce a diverse and high-quality set of candidate SQL queries. The generator employs two core methods: firstly, a Divide-and-conquer strategy that breaks down complex queries into manageable sub-queries within a single LLM call, and secondly the construction of an In-domain Knowledge Base for the database schema using LLMs to enhance contextual understanding. To ensure the quality of the generated SQL statements, we also developed a dedicated selector agent to refine and select high-quality SQL queries produced by the generator. Additionally, we employed a few-shot learning approach, leveraging LLMs to fine-tune and refine the candidate SQL queries for improved accuracy and performance. Experimental results demonstrate that the DFM-SQL framework not only significantly enhances the quality and diversity of SQL queries, but also substantially narrows the gap between execution accuracy and exact match accuracy. In benchmark tests on the Spider Text-to-SQL dataset, DFM-SQL achieved groundbreaking results: an execution accuracy of 85.3% and an exact match accuracy of 86.3%, with only a 1\% difference between the two metrics. This achievement marks a new milestone in the consistency between execution accuracy and exact match accuracy, while also pushing the exact match accuracy to a new SOTA level.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Divide and Conquer, Intra-domain knowledge base, Candidate Selection, Fixing, Text-to-SQL, LLM
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 7626
Loading