PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning

PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning

ACL ARR 2026 January Submission9466 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Reasoning, Large Reasoning Models, Reinforcement Learning

Abstract: Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from "overthinking", producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose PACE, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that PACE achieves a substantial reduction in token usage (up to 55.7\%) while simultaneously improving accuracy (up to 4.1\%) on math benchmarks, with generalization ability to code, science, and general domains.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: efficient models, LLM Efficiency, reinforcement learning, reasoning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 9466

Loading