REPA-PRM: Reference-Evaluated Process Annotation for Process Supervision and Reward Modelling

REPA-PRM: Reference-Evaluated Process Annotation for Process Supervision and Reward Modelling

ACL ARR 2025 February Submission7469 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Process supervision has played a crucial role in advancing the complex multi-step reasoning capabilities of Large Language Models (LLMs). However, ensuring high-quality and efficient automatic process annotation remains a challenge. To address this, we introduce \textbf{R}eference-\textbf{E}valuated \textbf{P}rocess \textbf{A}nnotation (\textbf{\texttt{REPA}}), a novel and structured framework that enables per-step annotation in a single stage. \texttt{REPA} evaluates each solution step by referencing one or multiple ground-truth steps with explicit reasoning for assessment. We show that reference-guided step-level evaluation effectively facilitates process supervision. Our results demonstrate that fine-tuning a base-instruct model and training a reward model using \texttt{REPA} annotations improve reasoning performance under both single-greedy decoding and ranking/aggregation of multiple LLM-generated outputs. Notably, we show improvements across four datasets spanning three domains: mathematical reasoning, multi-hop compositional question answering, and spatial reasoning. Our work contributes to reference-guided automatic process supervision which is underexplored and holds potential for enhancing LLM reasoning capabilities.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Process Supervision, Process Reward Models, Reward Models, Reference Evaluation, Process Annotation, Reasoning,

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 7469

Loading