ReSQL: Self-Improving Framework for Reasoning-Aware Text-to-SQL Dataset Generation

ReSQL: Self-Improving Framework for Reasoning-Aware Text-to-SQL Dataset Generation

ACL ARR 2026 January Submission7376 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-SQL, Reasoning, Self-improving Models, Synthetic Data Generation, RAG, LLM

Abstract: Recent advances in Text-to-SQL have greatly benefited from large language models, yet small and medium-sized models still suffer from frequent execution errors and limited self-correction ability. We present ReSQL (Retrieval-augmented error reasoning for Text-to-SQL), a self-improving framework that generates and learns from its own error-reasoning dataset, enabling models to autonomously refine their SQL generation and correction capabilities. ReSQL combines feedback-driven fine-tuning with retrieval-based inference: it gathers model-generated errors, analyzes them through structured feedback prompts, and retrieves relevant correction examples during inference. This unified approach allows models to internalize robust error-reasoning patterns and dynamically apply them to unseen queries. Experimental results on the SPIDER and BIRD benchmarks show that ReSQL substantially improves execution accuracy and self-correction ability over strong baselines, achieving competitive performance with much larger proprietary models such as GPT-4. Our findings highlight ReSQL as a promising step toward self-improving, reasoning-aware Text-to-SQL systems that can continually enhance their reliability and interpretability without external supervision. All code and generated reasoning datasets are available to facilitate application to open-source LLMs and reproducible baseline training.

Paper Type: Long

Research Area: Hierarchical Structure Prediction, Syntax, and Parsing

Research Area Keywords: semantic parsing

Contribution Types: Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 7376

Loading