TL;DR: Counterexample-guided proof repair driven by LLM
Abstract: Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present ExVerus, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, ExVerus automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that ExVerus significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.
Lay Summary: Software is increasingly used in systems where mistakes can have serious consequences, such as cloud infrastructure, security tools, and operating systems. Formal verification is a way to mathematically check that software behaves as intended, but writing these proofs usually requires substantial expert effort. Recent AI systems can help generate such proofs, but when a proof fails, they often receive only vague error messages, making it difficult to understand and fix the problem.
This paper introduces ExVerus, a system that helps AI models repair failed software proofs by showing them concrete examples of what went wrong. Instead of only reporting that a proof failed, EXVERUS generates specific failing cases, uses them to guide the AI model toward the source of the problem, and then checks the repaired proof again. Across several benchmarks, EXVERUS successfully proves more programs than prior systems, remains more robust when programs are rewritten in different but equivalent ways, and uses less computation cost.
Link To Code: https://github.com/claudeyj/exverus
Primary Area: Applications->Everything Else
Keywords: Software Verification, Proof Generation, Large Language Models, Software Engineering
Originally Submitted PDF: pdf
Submission Number: 2523
Loading