Keywords: large language model, reasoning, concise reasoning
Abstract: Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU).
As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies *Lagrangian* optimization to convert it into a tractable unconstrained problem.
As a pragmatic solution, PALU streamlines complicated update rules through three approximations: *(i)* estimating performance with off-policy rollouts, *(ii)* truncating the *Lagrange* multiplier to two extremes, and *(iii)* replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65\% while improving accuracy by 15\% when applied to *DeepSeek-Distill-Qwen-1.5B*, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19020
Loading