GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL

GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL

ACL ARR 2026 January Submission2247 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text2SQL, Multi-Agent, Semantic Validation, Large Language Models, Benchmark Curation, Gold Errors

Abstract: While Large Language Models have significantly advanced Text2SQL generation, a critical semantic gap persists where syntactically valid queries often misinterpret user intent. To mitigate this challenge, we propose GBV-SQL, a novel multi-agent framework that introduces Guided Generation with SQL2Text Back-translation Validation. This mechanism uses a specialized agent to translate the generated SQL back into natural language, which verifies its logical alignment with the original question. Critically, our investigation reveals that current evaluation is undermined by a systemic issue: the poor quality of the benchmarks themselves. We introduce a formal typology for ''Gold Errors'', which are pervasive flaws in the ground-truth data, and demonstrate how they obscure true model performance. On the challenging BIRD benchmark, GBV-SQL achieves 63.23\% execution accuracy, a 5.8\% absolute improvement. After removing flawed examples from Spider and repairing flawed examples in BIRD, GBV-SQL achieves 96.5\% (dev) and 97.6\% (test) execution accuracy on Spider, and 90.42\% on the corrected BIRD dataset. Our work offers both a robust framework for semantic validation and a critical perspective on benchmark integrity, highlighting the need for more rigorous dataset curation.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: NLP Applications,Machine Learning for NLP,LLM Agents

Contribution Types: NLP engineering experiment

Languages Studied: english,sql

Submission Number: 2247

Loading