Abstract: Recent advances in large language models (LLMs) have demonstrated impressive reasoning capabilities, often achieved through prolonged and computationally intensive inference-time deliberation. However, these extended reasoning sequences can lead to redundancy and inefficiency, a phenomenon known as overthinking.
This paper introduces a lightweight reward mechanism to a recent reinforcement learning framework to promote efficient reasoning in LLMs by balancing accuracy with brevity.
Our approach combines a length-aware reward mechanism with dynamically scheduled accuracy thresholds to mitigate verbosity without sacrificing correctness.
Empirical results across six math reasoning benchmarks show that the method significantly reduces output length (over 50\%) while preserving or even improving accuracy and semantic quality.
Comprehensive reasoning behavior analyses further reveal that the method reduces redundant reasoning strategies.
Moreover, our method can refine the structure of LLM inference texts, promoting concise and high-quality reasoning processes.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: LLM Reasoning, LLM Efficiency
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1578
Loading