Keywords: Efficient Reasoning, Large Reasoning Models, Reinforcement Learning
Abstract: Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from "overthinking", producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose PACE, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that PACE achieves a substantial reduction in token usage (up to 55.7\%) while simultaneously improving accuracy (up to 4.1\%) on math benchmarks, with generalization ability to code, science, and general domains.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: efficient models, LLM Efficiency, reinforcement learning, reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 9466
Loading