Post Hoc Neuro-Symbolic Verification on Instruction Following of Language Models

15 Sept 2025 (modified: 31 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: neuro-symbolic, instruction following, large language models
TL;DR: We introduce NSVIF, a neuro-symbolic framework for post-hoc verification of instruction-unfollowing in large language models, along with a benchmark VIFBench to rigorously evaluate LLM adherence to instructions.
Abstract: Large Language Models (LLMs) are increasingly used for real-world problem-solving and decision-making. However, LLMs may not follow instructions, with subtle behavior that is hard to detect and diagnose. The impacts of instruction-unfollowing behavior may be further magnified in an LLM agent along its reasoning chain. This paper presents NSVIF, a novel framework for post hoc verification on instruction following of LLMs. At its core, NSVIF abstracts instruction-following verification as a Constraint Satisfaction Problem (CSP), where both instructions and LLM outputs are represented as structured constraints, including symbolic and neural constraints. NSVIF introduces a neuro-symbolic solver that embraces symbolic reasoning and neural inference—the former offers sound logic while the latter detects semantic violations. We curated a comprehensive benchmark, VIFBENCH, to evaluate instruction-following verifiers, and developed a neuro-symbolic-guided synthesis method to construct data in a scalable and high-quality manner. We show the effectiveness of NSVIF on VIFBENCH, where NSVIF significantly outperforms the existing baselines. Our work shows that unified symbolic verification with LLM-guided reasoning enables effective, reliable, and interpretable analysis of LLM instruction-following behavior.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 5652
Loading