Keywords: Text-to-SQL, refusal gating, answerability detection, unanswerable queries, ambiguity detection, latent-state probing, hallucination detection, LLM safety
Abstract: In LLM-based Text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorrect text but also executable programs that yield misleading results or violate safety constraints, thus posing a major barrier to safe deployment. Existing refusal strategies for such queries either rely on outputlevel instruction following, which is brittle due to model hallucinations, or on estimating output uncertainty, which adds complexity and overhead. To address this challenge, we first formalize safe refusal in Text-to-SQL systems as an answerability-gating problem, and then propose LatentRefusal, a latent-signal refusal mechanism that predicts query answerability from intermediate hidden activations of an LLM. We introduce the Tri-Residual Gated Encoder (TRGE), a lightweight probing architecture, to suppress schema noise and amplify sparse, localized question-schema mismatch cues that indicate unanswerability. Extensive empirical evaluations across diverse ambiguous and unanswerable settings, together with ablations and interpretability analyses, have verified the effectiveness of the proposed scheme and demonstrated that LatentRefusal provides an attachable, efficient safety layer for Text-toSQL systems.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: safety and alignment, calibration/uncertainty, probing, semantic parsing, code models, security and privacy, robustness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: Chinese, English
Submission Number: 5453
Loading