Challenges in Inference-Time Scaling with Uncertainty-Aware Tree Search

Jacopo Minniti; Neil Band; Tim G. J. Rudner

Challenges in Inference-Time Scaling with Uncertainty-Aware Tree Search

Jacopo Minniti, Neil Band, Tim G. J. Rudner

Published: 03 Mar 2026, Last Modified: 07 Mar 2026SPOTEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, uncertainty quantification, inference-time scaling, process reward models

TL;DR: We attempt to mitigate reward hacking in inference-time search by using uncertainty estimates to guide compute allocation, but find that search optimization exploits these estimates, causing them to become miscalibrated and degrade performance.

Abstract: Inference-time search has emerged as a powerful paradigm for scaling the reasoning capabilities of large language models. Standard approaches, such as beam search, rely on process reward models (PRMs) to provide dense, step-by-step scoring to identify promising reasoning paths. However, scaling these methods results in a known failure mode: as compute budgets increase, the search algorithm encounters out-of-distribution states that are spuriously assigned high value, decoupling the proxy reward from actual reasoning ability. To address this issue, we propose Uncertainty-Aware Tree Search (UATS). Rather than relying solely on PRM value estimates, UATS uses a process uncertainty model (PUM) to predict when the value model's predictions are likely unreliable. UATS uses PUM predictions to dynamically allocate computational resources, increasing the branching factor at high-uncertainty nodes to resolve ambiguity through exploration. In our empirical evaluation, we find that while PUMs perform well on held-out in-distribution data, strong in-distribution generalization does not translate into improved downstream inference-time uncertainty-guided search. On instruction-tuned models, UATS matches standard beam search, whereas when applied to reasoning models, it consistently and counterintuitively degrades performance as inference-time compute grows. This failure is an instructive negative result, as it suggests that the search-induced distribution shift that leads to poor generalization for PRMs also leads to poor generalization for process uncertainty models. Our results demonstrate that uncertainty-guided inference-time scaling requires robust process uncertainty models that remain reliable under search-induced distribution shift.

Submission Number: 23

Loading