VET: Verifiable Execution Tracing for Reliable Text-to-SQL Generation

ACL ARR 2026 January Submission8238 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text-to-SQL, Verifiable Execution, Large Language Models, Program Synthesis, Reasoning
Abstract: Large language models (LLMs) have shown remarkable capabilities in text-to-SQL generation, yet existing approaches remain prone to hallucinations and lack verification mechanisms. Current methods such as Chain-of-Thought (CoT) and Program-of-Thought (PoT) typically rely on intermediate reasoning that is either purely textual or executed only as a final step, leaving the reasoning process opaque and prone to grounding and logical hallucinations. In this paper, we introduce Verifiable Execution Tracing (VET), a novel reasoning paradigm that transforms text-to-SQL from unverifiable textual rationales into step-wise executable semantics. VET addresses these limitations by constraining the reasoning process within a candidate schema space and formulating it as a sequence of executable Python steps. Crucially, each step is executed against the real database to produce observable intermediate results, which serve as immediate verification feedback and transform the traditionally opaque generation process into a transparent, debuggable interaction with database reality. Experiments demonstrate superior performance: 70.93\% execution accuracy on BIRD benchmark, with exceptional gains on complex queries, validating that executable reasoning fundamentally outperforms textual alternatives.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: text-to-text generation,analysis,interactive and collaborative generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 8238
Loading