Beaver: An Efficient Deterministic LLM Verifier
Track: long paper (up to 8 pages)
Keywords: Large Language Models, Formal Verification, Safety Verification, Constraint Satisfaction, Neural Network Verification, Probabilistic Guarantees, Branch and Bound, LLM Safety, Trustworthy Machine Learning, Semantic Constraints, Model Certification, Robustness Verification
TL;DR: BEAVER is the first practical framework for deterministic LLM verification, computing probability bounds on semantic constraint satisfaction and achieves 6-8× tighter bounds than baseline methods across correctness, privacy, and security tasks.
Abstract: As large language models (LLMs) transition from research prototypes to production
systems, practitioners often need reliable methods to verify that model outputs
satisfy required constraints. While sampling-based estimates provide an intuition
of model behavior, they offer no sound guarantees. We present BEAVER, the
first practical framework for computing deterministic, sound probability bounds
on LLM constraint satisfaction. Given any prefix-closed semantic constraint,
BEAVER systematically explores the generation space using novel Token trie and
Frontier data structures, maintaining provably sound bounds at every iteration. We
formalize the verification problem, prove soundness of our approach, and evaluate
BEAVER on secure code generation, privacy and correctness verification tasks
across multiple state-of-the-art LLMs. BEAVER achieves 6–8× tighter probability
bounds and identifies 3–4× more high-risk instances compared to baselines under
identical computational budgets. This enables precise characterization and risk
assessment that loose bounds or empirical evaluation cannot provide.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 46
Loading