CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

ACL ARR 2025 July Submission1231 Authors

29 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase—the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 509K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: grammatical error correction, code generation and understanding, mathematical NLP
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Natural language, Lean 4
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Appendix A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3.1
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: N/A
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The data used is a publicly available dataset that does not contain personally identifying information. The data content has undergone preprocessing, and it has been confirmed to contain no offensive content. Additionally, the research data includes synthetic data, which does not involve real personal information.
B5 Documentation Of Artifacts: N/A
B6 Statistics For Data: Yes
B6 Elaboration: Section 3.1
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 5.1
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 5.1
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 5.2
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Appendix H
D2 Recruitment And Payment: Yes
D2 Elaboration: Appendix C.1
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 1231
Loading