Clover: Closed-Loop Verifiable Code Generation

15 Sept 2023 (modified: 27 Jan 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: large language models, code generation, dafny, verification
Abstract: The use of large language models for code generation is a rapidly developing trend in contemporary software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of dangerous or even catastrophic outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-loop Verifiable Code Generation. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at checking the correctness of code. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results show that for this dataset, (i) LLMs are reasonably successful at automatically generating formal specifications; and (ii) our consistency checker achieves a promising acceptance rate (>= 75%) for correct instances while maintaining zero tolerance for incorrect ones.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 113
Loading