MarkovScale: Towards Optimal Sequential Scaling at Inference Time

Youkang Wang; Jian Wang; Rubing Chen; Tianyi Zeng; Xiaoyong Wei; Li Qing

MarkovScale: Towards Optimal Sequential Scaling at Inference Time

Youkang Wang, Jian Wang, Rubing Chen, Tianyi Zeng, Xiaoyong Wei, Li Qing

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inference-time Scaling, Markov Decision Process, Efficiency Optimality, Probabilistic Framework, LLM Reasoning

Abstract: Sequential scaling is a prominent inference-time scaling paradigm, yet its performance improvements are typically modest and not well understood, largely due to the prevalence of heuristic, non-principled approaches that obscure clear optimality bounds. To address this, we introduce a principled framework that models sequential scaling as a two-state Markov process, uncovering its fundamental properties and providing closed-form expressions for key aspects, including the conditions under which sequential scaling enhances accuracy, the theoretical accuracy upper bound, and the convergence rate. Leveraging this formulation, we develop MarkovScale, a practical system that applies these optimality criteria to achieve a theoretically grounded balance between accuracy and efficiency. Comprehensive experiments across 3 backbone LLMs and 5 benchmarks show that MarkovScale consistently outperforms state-of-the-art parallel and sequential scaling methods, representing a significant step toward optimal and resource-efficient inference in LLMs. The source code will be open upon acceptance at https://open-upon-acceptance.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 4508

Loading