Adaptive Inference Scaling via Monte Carlo Sampling

Adaptive Inference Scaling via Monte Carlo Sampling

Published: 23 Sept 2025, Last Modified: 01 Dec 2025FPI-NEURIPS2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main Track

Keywords: LLM Inference Scaling, Monte Carlo Methods

TL;DR: We apply Monte Carlo methods to LLM Inference Time Scaling

Abstract: LLM inference time scaling has emerged as an important paradigm for training-free alignment of LLMs using external reward signals. However, central questions regarding practical deployment, such as answer selection methods and optimal compute allocation, remain poorly understood, with advancements primarily driven by empirical heuristics. To address this, we provide a principled framework for analyzing inference time scaling via Monte Carlo (MC) sampling. This framework treats inference scaling as a statistical estimation problem over a reward weighted posterior, and introduces principled choices for response selection and compute allocation strategies. Experiments on mathematical reasoning benchmarks show that (i) our MC derived inference scaling methods outperform baseline strategies, (ii) our adaptive inference scaling strategy dynamically adjusts compute on per-query basis, allocating more compute to challenging prompts.

Submission Number: 61

Loading