ACTS: Adaptive Control for Test-time Scaling

ICLR 2026 Conference Submission11345 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inference Time Scaling, Efficient Inference, Decoding Strategies
Abstract: Controlling the generation length of Large Language Models (LLMs) presents a difficult trade-off between computational cost and output quality. We tackle this challenge with the ACTS (Adaptive Control for Test-time Scaling) framework, which leverages a novel ``termination signal'', the rich probabilities a model assigns to control tokens like End-of-Sequence. By framing generation as an optimal stopping problem, ACTS uses this signal to dynamically decide when to terminate. Experiments demonstrate that our policies significantly outperform baselines, reducing token usage on conversational tasks by 35\% while boosting mathematical reasoning accuracy by 13.3\% on AIME and 9.8\% on Math 500 through more efficient thinking, which is achieved simply by changing the token-sampling process. ACTS thus enables a new class of signal-driven, principled control over LLM generation, paving the way for more efficient and adaptable model inference.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 11345
Loading