Dynamic Reasoning Budgeting: Adaptive Routing for Token Efficiency in Large Reasoning Models

ACL ARR 2026 January Submission748 Authors

24 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Efficiency; parameter-efficient training; prompt strategy
Abstract: Large reasoning models (LRMs) deliver strong performance on complex tasks through multi-step deliberation. However, their reliance on long chains of thought often causes excessive token consumption and elevated inference cost but only yields limited improvement in accuracy. This work frames this inefficiency as a reasoning budget allocation problem that determines how much computation and token budget should be invested per instance. To address this, we propose a dynamic reasoning budgeting framework that adaptively routes each input to an appropriate reasoning path according to estimated difficulty. Specifically, simple problems are handled by a lightweight model under strict length constraints, whereas difficult problems are processed by LRMs with optimized reasoning prompts. On benchmarks covering arithmetic, logic, and commonsense reasoning, our method substantially reduces token usage while preserving or improving accuracy and outperforms latest routing and reasoning work. The results suggest that efficiency in LRMs stems not from universally deeper reasoning chains, but from allocating only the amount of reasoning budget required for each problem.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency; parameter-efficient training; prompt strategy
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 748
Loading