DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification

Shentong Mo

DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification

Shentong Mo

Published: 02 Mar 2026, Last Modified: 05 Mar 2026ICLR 2026 Workshop VerifAI-2EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: Code LLM, Formal Verification

TL;DR: We propose DafnyLLM, a specialized language model designed to automate formal verification by bridging the gap between code implementation and logical specifications.

Abstract: Formal verification requires a deep integration of program semantics and logical specifications, posing challenges beyond those addressed by general-purpose code language models. In particular, the Dafny language introduces a tightly coupled syntax of implementation and specification, demanding models that can capture both procedural structure and verification intent. In this work, we propose DafnyLLM, the first pre-trained large language model tailored for the Dafny verification language. DafnyLLM leverages hybrid structural priors, including abstract syntax trees, control/data flow graphs, and assertion flow graphs, to encode rich program and specification semantics. To support Dafny’s long-range dependencies across intertwined definitions, specifications, and proofs, we design a sparse attention mechanism with memory and bridge tokens, enabling efficient and scalable modeling of verification contexts. We pre-train DafnyLLM on a curated corpus of real-world Dafny programs and evaluate it on verification-aware downstream tasks, such as specification-conditioned code retrieval and assertion classification. Experimental results show that DafnyLLM outperforms state-of-the-art code models by a significant margin, demonstrating the importance of incorporating verification-specific structure in code representation learning.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Shentong_Mo1

Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Submission Number: 55

Loading