Compute When Worth It: Risk Control for Reasoning on a Compute Budget

Anushri Suresh; Alvin Zhang; Rishi More; William Jurayj; Benjamin Van Durme; Eric Nalisnick; Daniel Khashabi

Compute When Worth It: Risk Control for Reasoning on a Compute Budget

Anushri Suresh, Alvin Zhang, Rishi More, William Jurayj, Benjamin Van Durme, Eric Nalisnick, Daniel Khashabi

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test Time Compute; Early Exit; Efficient Reasoning

Abstract: Test-time Compute (TTC) offers a powerful way to boost LLM accuracy across diverse tasks, yet its high inference cost makes indiscriminate use impractical. Thus, we want to invoke it only when the additional computation is “worth it.” We introduce a principled stopping criterion grounded in Distribution-Free Risk Control (DFRC) that determines when additional reasoning is warranted. This framework employs two complementary thresholds: a lower threshold that skips cases where further computation is unlikely to improve performance within a computing budget, and an upper threshold that halts reasoning once sufficient confidence is reached. This guarantees user-specified risk control while adaptively allocating computation across instances. Experiments with two open-weight models and four benchmarks demonstrate that our approach achieves significant reductions in reasoning cost (up to 52\% on AIME) while preserving accuracy within a narrow margin of full TTC.

Submission Number: 160

Loading