AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

18 Sept 2025 (modified: 27 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient Reasoning, LLM, Reasoning Models, Overthinking
Abstract: The reasoning-capable large language models (LLMs) demonstrate strong performance in complex reasoning tasks but often suffer from overthinking issues after distillation, generating unnecessarily long chain-of-thought (CoT) reasoning paths for easy reasoning questions, thereby increasing inference cost and latency. Recent work largely applies reinforcement learning to shorten reasoning paths in models that already possess reasoning capability. However, these approaches generalize poorly to non-reasoning LLMs, as they assume initial reasoning ability and rely on sparse, outcome-based rewards that make optimization unstable and limit effective learning. In this paper, we propose Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to adaptively adjust reasoning length according to input complexity, while specifically targeting the stage of transferring non-reasoning LLMs into reasoning-capable but efficient ones via distillation. AutoL2S introduces a learned mechanism in which LLMs are trained on data annotated with long and short CoT paths, together with a special \<EASY\> token that signals when long reasoning can be skipped. During inference, the \<EASY\> token can indicate when the model can skip generating lengthy CoT reasoning. Furthermore, we extend our framework with AutoL2S-Plus, which employs the AutoL2S as a reference model in a length-aware fine-tuning objective to calibrate expected reasoning length, enabling further efficiency gains without loss of accuracy. We theoretically and empirically find that the joint training of long and short CoT paths not only enables dynamic reasoning but also helps the training of shorter CoT generation through knowledge transfer from longer CoT paths. AutoL2S reduces reasoning length by up to 70\% without sacrificing performance, establishing it as an effective framework for scalable and efficient LLM reasoning.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12320
Loading