Beaver: An Efficient Deterministic LLM Verifier

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop VerifAI-2EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Large Language Models, Formal Verification, Safety Verification, Constraint Satisfaction, Neural Network Verification, Probabilistic Guarantees, Branch and Bound, LLM Safety, Trustworthy Machine Learning, Semantic Constraints, Model Certification, Robustness Verification
TL;DR: BEAVER is the first practical framework for deterministic LLM verification, computing probability bounds on semantic constraint satisfaction and achieves 6-8× tighter bounds than baseline methods across correctness, privacy, and security tasks.
Abstract: As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel Token trie and Frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on secure code generation, privacy and correctness verification tasks across multiple state-of-the-art LLMs. BEAVER achieves 6–8× tighter probability bounds and identifies 3–4× more high-risk instances compared to baselines under identical computational budgets. This enables precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 46
Loading