CATS: Conformalized Adaptive Test-Time Scaling

Published: 28 Feb 2026, Last Modified: 04 Apr 2026CAO OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adaptive test-time scaling, risk-control, conformal prediction
Abstract: Reasoning language models (RLMs) have shown significant improvements through test-time scaling. However, these gains come at the cost of more resource-intensive and slower inference, leading to latencies that directly impacts user experience. This cost can be reduced by avoiding overthinking and responding quickly to simple queries. Ideally, models should adapt their thinking strategy to the difficulty of the task. Such adaptivity is increasingly feasible because recent foundation models already provide discrete options for reasoning effort (e.g., low, high). We provide a framework for adaptive test-time scaling with a guarantee of controlled risk. Our method estimates task difficulty based on the probability of success at each reasoning level (while keeping additional cost minimal) and adjusts its output via conformal risk control. Specifically, our method guarantees that the probability of producing an incorrect answer remains below a user-specified tolerance level, while adaptively allocating less compute to easier queries and more to harder ones. Our results demonstrate that an appropriate reasoning level can be selected automatically while ensuring rigorous statistical guarantees.
Submission Number: 68
Loading