ExVerus: Verus Proof Repair via Counterexample Reasoning

Jun Yang; Yuechun Sun; Yi Wu; Rodrigo Caridad; Yongwei Yuan; Jianan Yao; Shan Lu; Kexin Pei

ExVerus: Verus Proof Repair via Counterexample Reasoning

Jun Yang, Yuechun Sun, Yi Wu, Rodrigo Caridad, Yongwei Yuan, Jianan Yao, Shan Lu, Kexin Pei

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Counterexample-guided proof repair driven by LLM

Abstract: Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present ExVerus, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, ExVerus automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that ExVerus significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.

Lay Summary: Software is increasingly used in systems where mistakes can have serious consequences, such as cloud infrastructure, security tools, and operating systems. Formal verification is a way to mathematically check that software behaves as intended, but writing these proofs usually requires substantial expert effort. Recent AI systems can help generate such proofs, but when a proof fails, they often receive only vague error messages, making it difficult to understand and fix the problem. This paper introduces ExVerus, a system that helps AI models repair failed software proofs by showing them concrete examples of what went wrong. Instead of only reporting that a proof failed, EXVERUS generates specific failing cases, uses them to guide the AI model toward the source of the problem, and then checks the repaired proof again. Across several benchmarks, EXVERUS successfully proves more programs than prior systems, remains more robust when programs are rewritten in different but equivalent ways, and uses less computation cost.

Link To Code: https://github.com/claudeyj/exverus

Primary Area: Applications->Everything Else

Keywords: Software Verification, Proof Generation, Large Language Models, Software Engineering

Originally Submitted PDF: pdf

Submission Number: 2523

Loading