Internal Value Functions: Leveraging Hidden States for Efficient Test-Time Scaling in Large Reasoning Models

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: process reward models, large reasoning models, large language models, test-time scaling, search, best-of-N, hidden states.
TL;DR: We build a light-weight process reward model from the hidden states of large reasoning models and apply it to test-time search algorithms.
Abstract: Large Reasoning Models (LRMs) generate extensive hidden states during inference, which encode rich information about the input context and probabilistically influence future token predictions. We propose Internal Value Functions (IVF), a novel approach that leverages these hidden states to approximate state-value functions, effectively predicting how likely a partial reasoning trajectory will converge to the correct answer without additional inference steps. Unlike traditional Process Reward Models (PRMs) that require separate model evaluations, our method enables efficient implementation of several test-time scaling techniques by extracting predictive signals from intermediate representations computed during the forward pass. Experimental results on challenging reasoning benchmarks demonstrate that IVF achieves comparable or better performance than external PRMs while significantly reducing computational overhead.
Submission Number: 130
Loading