LC-R1: Optimizing Length Compression in Large Reasoning Model

ACL ARR 2025 May Submission2862 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Reasoning Models (LRMs) have made great progress in complex reasoning tasks by being trained to generate step-by-step thinking paths. However, the length of these models' outputs also increases drastically with unnecessary reasoning chains---a phenomenon termed "overthinking"---especially when solving simple problems with clear solution paths. This paper introduces three principles for efficient reasoning: Simplicity (minimizing redundant content), Sufficiency (ensuring critical reasoning steps are retained), and Accuracy (arriving at correct answers). Motivated by them, we introduce LC-R1, a reinforcement learning (RL) algorithm introducing a novel collaboration of length reward and a compress reward/penalty, in addition to the accuracy reward. Hence, it encourages compression that can preserve the accuracy and the completeness of the thinking process. Extensive experiments across five mathematical reasoning benchmarks with Distill-Qwen-1.5B/7B as base models demonstrate that LC-R1 outperforms other RL-based and SFT-based methods in both compression rate and accuracy, significantly reducing output tokens with minimal accuracy loss. Our findings provide valuable insights for developing more efficient LRMs that balance computational resource usage with reasoning quality.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Length Compression, Efficiency, Reinforcement Learning
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 2862
Loading