Keywords: Large Language Models, Large Reasoning Models, Adaptive Reasoning
TL;DR: We propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate reasoning formats based on the task at hand.
Abstract: While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty.
This often leads to the “overthinking” problem—excessive and unnecessary reasoning—which, although potentially mitigated by human intervention to control the token budget, still fundamentally contradicts the goal of achieving fully autonomous AI.
In this work, we propose the Adaptive Reasoning Model (ARM), capable of adaptively selecting appropriate reasoning formats based on the task at hand.
These formats include three efficient ones—Direct Answer, Short CoT, and Code—as well as a more elaborate format, Long CoT.
To train ARM, we introduce Ada-GRPO, an adaptation of Group Relative Policy Optimization (GRPO), which addresses the format collapse issue in traditional GRPO.
Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of $\sim$30\% and up to $\sim$70\%, while maintaining performance comparable to models that rely solely on Long CoT.
All resources will be released.
Submission Number: 31
Loading