Keywords: Large Language Models, Formal Verification, neuro-symbolic, Text-to-SQL
Abstract: Large Language Models (LLMs) for Text-to-SQL frequently generate formally incorrect queries, yet existing repair methods rely on diagnostically impoverished execution feedback. This forces LLMs to speculate on error causes, undermining reliability. We introduce Post-SQLFix, a neuro-symbolic framework instigates a paradigm shift from ambiguous feedback to verifiable repair, designed to supersede the database execute feedback paradigm.
Our approach canonicalizes any SQL query into a dialect-aware, canonical query structure (CQS) representation. Upon CQS, a symbolic engine performs systematic syntactic, context-free, and context-sensitive semantic analysis to produce a sound diagnosis. We then formalize repair as a constrained synthesis problem: for any detected violation, our engine synthesizes a constrained space of formally verifiable repair plans. This transforms the LLM from an unreliable corrector into a constrained agent tasked with implementing a valid, synthesized plan.
On the BIRD and Spider benchmarks, including multi-dialect subsets, Post-SQLFix boosts execution accuracy by up to +11.6\% and reduces repair iterations by 50\% compared to execution feedback. By replacing ambiguous feedback with formal guarantees, our framework represents a significant step towards building robust and trustworthy AI-driven code generation.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 12118
Loading