Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs

Mauricio Barba da Costa; Fabian Zaiser; Katherine M. Collins; Romir Patel; Timothy J. O'Donnell; Alexander K. Lew; Joshua B. Tenenbaum; Vikash Mansinghka; Cameron Freer

Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs

Mauricio Barba da Costa, Fabian Zaiser, Katherine M. Collins, Romir Patel, Timothy J. O'Donnell, Alexander K. Lew, Joshua B. Tenenbaum, Vikash Mansinghka, Cameron Freer

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: autoformalization, cycle consistency, incremental type-checking, language-model probabilistic programs, Lean theorem prover

TL;DR: We probabilistically constrain LMs with Lean type-checks and cycle-consistent backtranslation to boost autoformalization quality.

Abstract: Autoformalization, the task of translating natural-language mathematics into a formal language such as Lean, has the potential to change the nature of mathematical discovery — providing mathematicians more certainty via verification during their proof-discovery process and even facilitating automated mathematical reasoning. In recent years, language models have made great strides towards autoformalization, but the task remains challenging both because of the rigidness of the target formal languages and the inherent uncertainty in informal texts. We propose a method for autoformalization via language-model probabilistic programming. Using tools for constrained generation from language models (LMs), we probabilistically steer LMs by incorporating two correctness signals: (i) incremental type-checking in Lean to rule out generations with no valid completion, and (ii) cycle-consistency whereby a backtranslation of the formalized statement is constrained to be similar to the original informal statement. We demonstrate that both of these signals can improve autoformalization quality on the miniF2F and LeanEuclid Book datasets, while requiring fewer tokens and shorter runtime.

Submission Number: 123

Loading