On the Provable Performance Guarantee of Efficient Reasoning Models

Published: 02 Mar 2026, Last Modified: 22 Apr 2026ES-Reasoning @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient reasoning, large language model, reasoning acceleration, PAC guarantees
TL;DR: A PAC-based switching rule that uses an upper confidence bound to decide when to skip “thinking,” reducing inference cost while guaranteeing the performance loss stays within a user-specified tolerance.
Abstract: Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A practical direction of efficiency improvement is to switch the LRM between thinking and non-thinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified tolerance. Specifically, we construct an upper confidence bound on the performance loss and determine a threshold for switching to the non-thinking model. Theoretically, using the threshold to switch between the thinking and non-thinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.
Submission Number: 22
Loading