Formal-PRM: A Process Reward Model Based on Formalized Verification

Formal-PRM: A Process Reward Model Based on Formalized Verification

ACL ARR 2026 January Submission9414 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Process Reward Model, Formal Verification, Test-time Scaling

Abstract: Large Language Models (LLMs) have demonstrated formidable capabilities in solving mathematical problems, yet they may still make logical reasoning and computational errors during the problem-solving process. Using Process Reward Models (PRMs) to evaluate the correctness of each step in solutions generated by large models is an important approach. However, existing PRMs still suffer from some problems, such as a lack of generalization. Therefore, this paper proposes a novel framework, Formal-PRM, which includes a Formalizer and a Checker, to formally verify the correctness of solutions generated by large language models. We empirically investigate the effectiveness of Formal-PRM in two scenarios: 1) Verification: Formal-PRM is used to determine whether a solution to a given problem is correct. 2) Test-time scaling: When Formal-PRM identifies errors in a solution generated by an LLM-based solution generator, it provides corrective suggestions from the Checker to the generator to re-generate the solution. We evaluate our framework on the widely used PRM benchmark: ProcessBench, demonstrating the superiority of our approach over existing methods. Moreover, our method outperforms existing approaches in test-time scaling.

Paper Type: Long

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 9414

Loading