Efficient RL Training for Reasoning Models via Length-Aware Optimization

Danlong Yuan; Tian Xie; Shaohan Huang; Zhuocheng Gong; Huishuai Zhang; Chong Luo; Furu Wei; Dongyan Zhao

Efficient RL Training for Reasoning Models via Length-Aware Optimization

Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER Workshop SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: llm reasoning, reinforcement learning, long to short reasoning

TL;DR: We propose a simple method to shorten the reasoning length during the reinforcement learning stage of large reasoning models while maintaining or even improving performance.

Abstract: Long reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by introducing additional training data and stages. In this paper, we propose three critical reward designs integrated directly into the rule-based reinforcement learning process of long reasoning models, which reduce the response length without extra training stages. Experiments on four settings show that our method significantly decreases response length while maintaining or even improving performance. Specifically, in a logic reasoning setting, we achieve a 40% reduction in response length averaged by steps alongside a 14% gain in performance. For math problems, we reduce response length averaged by steps by 33% while preserving performance.

Submission Number: 15

Loading