DFM-SQL: A Multi-Approach Framework with Candidate Selection and Fixing for Text-to-SQL

DFM-SQL: A Multi-Approach Framework with Candidate Selection and Fixing for Text-to-SQL

ACL ARR 2025 February Submission7626 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: To address the challenges of improving the performance of large language models in Text-to-SQL tasks, we propose DFM-SQL, a framework that integrates multiple innovative strategies to significantly enhance the generation and selection of candidate SQL statements. Specifically, we developed a multiple LLMs generator system to produce a diverse and high-quality set of candidate SQL queries. The generator employs two core methods: firstly, a Divide-and-conquer strategy that breaks down complex queries into manageable sub-queries within a single LLM call, and secondly the construction of an In-domain Knowledge Base for the database schema using LLMs to enhance contextual understanding. To ensure the quality of the generated SQL statements, we also developed a dedicated selector agent to refine and select high-quality SQL queries produced by the generator. Additionally, we employed a few-shot learning approach, leveraging LLMs to fine-tune and refine the candidate SQL queries for improved accuracy and performance. Experimental results demonstrate that the DFM-SQL framework not only significantly enhances the quality and diversity of SQL queries, but also substantially narrows the gap between execution accuracy and exact match accuracy. In benchmark tests on the Spider Text-to-SQL dataset, DFM-SQL achieved groundbreaking results: an execution accuracy of 85.3% and an exact match accuracy of 86.3%, with only a 1\% difference between the two metrics. This achievement marks a new milestone in the consistency between execution accuracy and exact match accuracy, while also pushing the exact match accuracy to a new SOTA level.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Divide and Conquer, Intra-domain knowledge base, Candidate Selection, Fixing, Text-to-SQL, LLM

Contribution Types: Approaches to low-resource settings

Languages Studied: English

Submission Number: 7626

Loading