A Bias–Variance Tradeoff Perspective for Improving Test-Time Scaling

Zixuan Hu; Zhenyi Wang; Dacheng Tao

A Bias–Variance Tradeoff Perspective for Improving Test-Time Scaling

Zixuan Hu, Zhenyi Wang, Dacheng Tao

02 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-time scaling, probabilistic inference

Abstract: Parallel test-time scaling (PTTS) improves the reasoning performance of large language models (LLMs) by aggregating multiple candidate solutions at inference time. However, existing methods remains largely heuristic, lacking a principled framework that explains their behavior, clarifies their limitations, and guides systematic improvement. To bridge this notable gap, we introduce the first general framework for PTTS through a unifying probabilistic inference formulation, seamlessly encompassing prior disparate methods as special cases. This framework enables a novel bias-variance tradeoff perspective to reveal the intrinsic limitations of existing methods and serves as a principled foundation for developing new ones. Specifically, our framework reveals that existing verifier-based methods act as *high-variance* importance sampling (IS) estimators, yielding marginal gains under small scaling budgets, whereas generator-based methods act as *biased* variational inference (VI) estimators, yielding suppressed scalability even under large budgets. To formally characterize the tradeoff, we derive the first theoretical bias–variance formulation for PTTS, revealing that the relative variance upper bound is jointly governed by the generator and the verifier via their respective optimality gaps. To mitigate this tradeoff, we build upon our general framework and derive a theory-driven PTTS method named TSMC-TTS. Specifically, it instantiates our framework with twisted sequential Monte Carlo (TSMC) and performs EM-like optimization of the generator and verifier guided by the theoretical variance bound, thus achieving monotonic variance reduction without compromising unbiasedness. Furthermore, we introduce a new self-evolving TSMC mechanism, effectively alleviating both the reward sparsity and the computational overhead issues inherent in vanilla TSMC. Rigorous theoretical analysis and comprehensive empirical results demonstrate the efficacy of our proposed method.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 812

Loading