CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

ACL ARR 2025 July Submission1231 Authors

29 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase—the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 509K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: grammatical error correction, code generation and understanding, mathematical NLP

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: Natural language, Lean 4

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Appendix A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 3.1

B2 Discuss The License For Artifacts: N/A

B3 Artifact Use Consistent With Intended Use: N/A

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: The data used is a publicly available dataset that does not contain personally identifying information. The data content has undergone preprocessing, and it has been confirmed to contain no offensive content. Additionally, the research data includes synthetic data, which does not involve real personal information.

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: Section 3.1

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 5.1

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 5.1

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 5.2

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Appendix H

D2 Recruitment And Payment: Yes

D2 Elaboration: Appendix C.1

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 1231

Loading