Automated Proof Generation for Rust Code via Self-Evolution

Tianyu Chen; Shuai Lu; Shan Lu; Yeyun Gong; Chenyuan Yang; Xuheng Li; Md Rakib Hossain Misu; Hao Yu; Nan Duan; Peng CHENG; Fan Yang; Shuvendu K Lahiri; Tao Xie; Lidong Zhou

Automated Proof Generation for Rust Code via Self-Evolution

Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng CHENG, Fan Yang, Shuvendu K Lahiri, Tao Xie, Lidong Zhou

Published: 22 Jan 2025, Last Modified: 19 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Program Verification

Abstract: Ensuring correctness is crucial for code generation. Formal verification offers a definitive assurance of correctness, but demands substantial human effort in proof construction and hence raises a pressing need for automation. The primary obsta- cle lies in the severe lack of data—there is much fewer proofs than code snippets for Large Language Models (LLMs) to train upon. In this paper, we introduce SAFE, a framework that overcomes the lack of human-written proofs to enable automated proof generation of Rust code. SAFE establishes a self-evolving cycle where data synthesis and fine-tuning collaborate to enhance the model capability, leveraging the definitive power of a symbolic verifier in telling correct proofs from incorrect ones. SAFE also re-purposes the large number of synthesized incorrect proofs to train the self-debugging capability of the fine-tuned models, empowering them to fix incorrect proofs based on the verifier’s feedback. SAFE demonstrates superior efficiency and precision compared to GPT-4o. Through tens of thousands of synthesized proofs and the self-debugging mechanism, we improve the capa- bility of open-source models, initially unacquainted with formal verification, to automatically write proofs for Rust code. This advancement leads to a signifi- cant improvement in performance, achieving a 52.52% accuracy rate in a bench- mark crafted by human experts, a significant leap over GPT-4o’s performance of 14.39%.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2209

Loading