Keywords: llm reasoning, reinforcement learning, long to short reasoning
TL;DR: We propose a simple method to shorten the reasoning length during the reinforcement learning stage of large reasoning models while maintaining or even improving performance.
Abstract: Long reasoning models have demonstrated remarkable performance on reasoning
tasks but often incur a long reasoning path with significant memory and time
costs. Existing methods primarily aim to shorten reasoning paths by introducing
additional training data and stages. In this paper, we propose three critical reward
designs integrated directly into the rule-based reinforcement learning process of
long reasoning models, which reduce the response length without extra training
stages. Experiments on four settings show that our method significantly decreases
response length while maintaining or even improving performance. Specifically, in
a logic reasoning setting, we achieve a 40% reduction in response length averaged
by steps alongside a 14% gain in performance. For math problems, we reduce
response length averaged by steps by 33% while preserving performance.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 3368
Loading