LACONIC: Length-Aware Constrained Reinforcement Learning for LLM

LACONIC: Length-Aware Constrained Reinforcement Learning for LLM

ICLR 2026 Conference Submission21571 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language model, reinforcement learning, constrained RL, LLM alignment, RL fine-tuning, length-aware LLM

Abstract: Reinforcement learning (RL) has enhanced the capabilities of large language models (LLMs) by enabling self-evolution through reward-driven training. Nevertheless, this process can introduce excessively long responses that inflate inference latency and computational overhead. To address this issue, existing RL-based length control methods often incorporate fixed penalties or heuristic reward shaping to encourage outputs of a desired length. However, such strategies may misalign the optimization objective with the underlying task, resulting in suboptimal performance and limited generalization across model architectures and datasets. In this work, we propose \texttt{LACONIC}, a lightweight reinforcement learning method that enforces a target token budget during training. Specifically, we update policy models using an augmented objective that combines the task reward with a length-based cost applied only to tokens exceeding the specified budget. Furthermore, to balance brevity and task performance, the cost scale is adjusted online throughout training. This formulation directly optimizes task reward subject to an explicit token budget constraint, delivering precise and performance-preserving length control. Across mathematical reasoning models and datasets, \texttt{LACONIC} preserves or improves \texttt{pass@1} while reducing output length by up to 43\%. It maintains out-of-domain performance on general knowledge and multilingual benchmarks with a 44\% reduction in tokens. Moreover, \texttt{LACONIC} integrates into standard RL fine-tuning with no inference changes and minimal deployment overhead.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21571

Loading