AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control

AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control

ACL ARR 2026 January Submission8347 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Reasoning, LLM Efficiency

Abstract: Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts, but this “overthinking” incurs high latency and cost without commensurate accuracy gains. In this work, we introduce AALC, a lightweight, accuracy-aware length penalty integrated into reinforcement learning that dynamically balances correctness and brevity during training. By incorporating validation accuracy into the reward and employing a dynamical schedule mechanism, AALC delays the length penalty until target performance is met. Through extensive experiments across standard and out-of-distribution math benchmarks, we show that our approach reduces response length by over 50% while maintaining or even improving the original accuracy. Furthermore, qualitative analysis reveals that our method curbs redundant reasoning patterns such as excessive subgoal setting and verification, leading to structurally refined outputs rather than naive truncation. We also identify that efficiency gains are accompanied by reduced interpretability: models trained with AALC omit some narrative framing and explanatory context. These findings highlight the potential of reward-based strategies to guide LRMs toward more efficient, generalizable reasoning paths.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: mathematical reasoning, LLM Efficiency

Languages Studied: English

Submission Number: 8347

Loading