READ-SQL: Reasoning Path Decomposer for Text-to-SQL

Yaxun dai; Haiqin Yang; Mou Hao; Pingfu Chao

READ-SQL: Reasoning Path Decomposer for Text-to-SQL

Yaxun dai, Haiqin Yang, Mou Hao, Pingfu Chao

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-SQL, Tabular Reasoning, SQL Decomposition, Abstract Syntax Trees, Self-Correction

Abstract: Text-to-SQL is a longstanding task aimed at automatically converting natural language questions into SQL queries for database retrieval. Despite impressive advancements, particularly with Large Language Models (LLMs), existing methods still struggle with issues such as misinterpreted, omitted, or unwanted constraints. To address these challenges, we propose READ-SQL, a novel framework employing a \underline{re}asoning p\underline{a}th \underline{d}compos\underline{er}, \textbf{READ}ER, for text-to-SQL tasks. READER decomposes SQLs into clauses, sub-SQLs, and reasoning paths, supporting data preparation and confidence level determination in post-processing. READ-SQL comprises two main models: a Generator and a Corrector, both trained via LoRA for parameter efficiency. Based on READER's decomposition, READ-SQL generates two types of augmented data using an LLM: question/SQL pairs and question/reason pairs. The Generator is trained on both original and augmented data to identify constraint changes and enhance reasoning. The Corrector is trained on data from READER’s post-processing, improving self-correction by refining high-confidence SQLs and addressing low-confidence elements. Extensive experiments show that READ-SQL significantly outperforms leading baselines, with READ-SQL-3B achieving 57.37\% execution accuracy on BIRD’s dev set, surpassing several 7B-parameter models and setting a new state-of-the-art with fewer parameters. Additionally, READER and the Corrector show broad applicability when integrated with LLMs or other base models.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9608

Loading