Reward Inside the Model: A Lightweight Hidden‑State Reward Model for LLM's Best-of-N sampling

Published: 09 Jul 2025, Last Modified: 25 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Large Language Models, Reasoning, Reward Modeling
TL;DR: We develop a highly efficient reward model for LLM's mathematical reasoning: the Efficient Linear Hidden State Reward (ELHSR) model.
Abstract: High-quality reward models are crucial for unlocking the mathematical reasoning potential of large language models (LLMs), with best-of-N sampling demonstrating significant performance gains. While efficiency is crucial for mathematical discovery, current reward models, which typically operate on the textual output of LLMs, are computationally expensive and parameter-heavy. We introduce the Efficient Linear Hidden State Reward (ELHSR) model - a novel, highly parameter-efficient approach that leverages the rich information embedded in LLM hidden states to address these issues. ELHSR **systematically outperforms baselines** with **less than 0.005\% of the parameters** of baselines, requiring only a few samples for training. ELHSR also achieves **orders-of-magnitude efficiency improvement** with significantly less time and fewer FLOPs per sample than baselines. Moreover, ELHSR exhibits robust performance even when trained only on logits, extending its applicability to some closed-source LLMs. In addition, ELHSR can also be combined with traditional reward models to achieve additional performance gains.
Submission Number: 76
Loading