DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

ACL ARR 2025 July Submission574 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reasoning large language models (RLLMs), such as OpenAI-O3 and DeepSeek-R1, have recently demonstrated remarkable capabilities by performing structured and multi-step reasoning. However, recent studies reveal that RLLMs often suffer from overthinking, i.e., producing unnecessarily lengthy reasoning chains even for simple questions, leading to excessive token consumption and computational inefficiency. Interestingly, we observe that when processing multiple questions in batch mode, RLLMs exhibit more resource-efficient behavior by dynamically compressing reasoning steps for easier problems, due to implicit resource competition. Inspired by this, we propose Dynamic Reasoning Quota Allocation (DRQA), a novel method that transfers the benefits of resource competition from batch processing to single-question inference. Specifically, DRQA leverages batch-generated preference data and reinforcement learning to train the model to allocate reasoning resources adaptively. By encouraging the model to internalize a preference for responses that are both accurate and concise, DRQA enables it to generate concise answers for simple questions while retaining sufficient reasoning depth for more challenging ones. Extensive experiments on a wide range of mathematical and scientific reasoning benchmarks demonstrate that DRQA significantly reduces token usage while maintaining, and in many cases improving, answer accuracy. By effectively mitigating the overthinking problem, DRQA offers a promising direction for more efficient and scalable deployment of RLLMs, and we hope it inspires further exploration into fine-grained control of reasoning behaviors.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: LLM Efficiency

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: english

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

A2 Elaboration: No, this work is an innovative exploration of algorithm training methods. It does not involve actual engineering applications or data production processes, so there is no need to discuss potential risks at this stage.

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Sections 2–4

B2 Discuss The License For Artifacts: No

B2 Elaboration: No. All artifacts (models, datasets, and code) used in this work are open source and were properly cited. As they are governed by their open-source licenses and not redistributed, a detailed license discussion was not required in the paper.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: No. All artifacts are open source and were used for research as intended, so this was not specifically discussed in the paper.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: No. All data are from open-source public datasets and were assumed to be pre-screened and anonymized by their providers.

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: Sections 2–4

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Sections 4

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Sections 4

C3 Descriptive Statistics: Yes

C3 Elaboration: Sections 4

C4 Parameters For Packages: Yes

C4 Elaboration: Sections 4

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: No

D3 Elaboration: No. All data are open source and consent was obtained by the original providers.

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 574

Loading