Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning

ACL ARR 2025 February Submission3243 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Process Reward Model (PRM) plays a crucial role in mathematical reasoning tasks, requiring high-quality supervised process data. However, we observe that reasoning steps generated by Large Language Models (LLMs) often fail to exhibit strictly incremental information, leading to redundancy that can hinder effective reasoning. To address this issue, we propose \model, a simple yet effective coarse-to-fine strategy. Instead of focusing on the detection of redundant steps, our approach first establishes a coarse-grained window to merge adjacent reasoning steps into unified, holistic steps. The window size is then progressively reduced to extract fine-grained reasoning steps, enabling data collection at multiple granularities for training. By leveraging this hierarchical refinement process, \model mitigates redundancy while preserving essential fine-grained knowledge. Extensive experiments on two reasoning datasets across three loss criteria validate the \model's effectiveness and versatility. Our code is available~\url{https://anonymous.4open.science/r/CFPRM-0FF2}.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: Large language model, process reward model
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3243
Loading