Track: Main Track
Keywords: LLM Inference Scaling, Monte Carlo Methods
TL;DR: We apply Monte Carlo methods to LLM Inference Time Scaling
Abstract: LLM inference time scaling has emerged as an important paradigm for training-free alignment of LLMs using external reward signals. However, central questions regarding practical deployment, such as answer selection methods and optimal compute allocation, remain poorly understood, with advancements primarily driven by empirical heuristics. To address this, we provide a principled framework for analyzing inference time scaling via Monte Carlo (MC) sampling. This framework treats inference scaling as a statistical estimation problem over a reward weighted posterior, and introduces principled choices for response selection and compute allocation strategies. Experiments on mathematical reasoning benchmarks show that (i) our MC derived inference scaling methods outperform baseline strategies, (ii) our adaptive inference scaling strategy dynamically adjusts compute on per-query basis, allocating more compute to challenging prompts.
Submission Number: 61
Loading