Abstract: Process supervision has played a crucial role in advancing the complex multi-step reasoning capabilities of Large Language Models (LLMs). However, ensuring high-quality and efficient automatic process annotation remains a challenge. To address this, we introduce \textbf{R}eference-\textbf{E}valuated \textbf{P}rocess \textbf{A}nnotation (\textbf{\texttt{REPA}}), a novel and structured framework that enables per-step annotation in a single stage. \texttt{REPA} evaluates each solution step by referencing one or multiple ground-truth steps with explicit reasoning for assessment. We show that reference-guided step-level evaluation effectively facilitates process supervision. Our results demonstrate that fine-tuning a base-instruct model and training a reward model using \texttt{REPA} annotations improve reasoning performance under both single-greedy decoding and ranking/aggregation of multiple LLM-generated outputs. Notably, we show improvements across four datasets spanning three domains: mathematical reasoning, multi-hop compositional question answering, and spatial reasoning. Our work contributes to reference-guided automatic process supervision which is underexplored and holds potential for enhancing LLM reasoning capabilities.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Process Supervision, Process Reward Models, Reward Models, Reference Evaluation, Process Annotation, Reasoning,
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7469
Loading