Keywords: Text-to-SQL、Multi-turn Tool-Integrated Reasoning、Reinforcement Learning
TL;DR: MTIR-SQL: A reinforcement learning framework that enhances Text-to-SQL through multi-turn reasoning with real-time database feedback integration.
Abstract: As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves 64.4\% accuracy in the BIRD Dev and 84.6\% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10607
Loading