Extending Test-Time Scaling: A 3D Perspective with Context, Batch, and Turn

Chao Yu; Qixin Tan; Jiaxuan Gao; Shi Yu; Hong Lu; Xinting Yang; Zelai Xu; Yu Wang; Yi Wu; Eugene Vinitsky

Extending Test-Time Scaling: A 3D Perspective with Context, Batch, and Turn

Chao Yu, Qixin Tan, Jiaxuan Gao, Shi Yu, Hong Lu, Xinting Yang, Zelai Xu, Yu Wang, Yi Wu, Eugene Vinitsky

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Test-Time Scaling, Human-in-the-loop

TL;DR: This paper introduces a unified framework of multi-dimensional test-time scaling which integrates context, batch, and turn scaling to extend the capacity of test-time reasoning.

Abstract: Reasoning reinforcement learning (RL) has recently revealed a new scaling effect: test-time scaling. Thinking models such as R1 and o1 improve their reasoning accuracy at test time as the length of the reasoning context increases. However, this effect is fundamentally limited by the context window of the base models, which remains orders of magnitude smaller than the amount of tokens consumed during training. We revisit test-time enhancement techniques through the lens of scaling effect and introduce a unified framework of multi-dimensional test-time scaling to extend the capacity of test-time reasoning. Beyond conventional context-length scaling, we consider two additional dimensions: batch scaling, where accuracy improves with parallel sampling, and turn scaling, where iterative self-refinement enhances reasoning quality. Building on this perspective, we propose 3D test-time scaling, which integrates context, batch, and turn scaling. We show that: (1) each dimension demonstrates a test-time scaling effect, but with a bounded capacity; (2) combining all three dimensions substantially improves the reasoning performance of challenging testbeds, such as IOI, IMO, and CPHO, and further benefits from human preference feedback; and (3) the human-in-the-loop framework naturally extends to a more open-ended domain, i.e., embodied learning, which enables the design of humanoid control behaviors.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 4738

Loading