Large Reasoning Models Know How to Think Efficiently

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Large Reasoner Model, Efficient Reasoning
TL;DR: This paper presents two methods, Pre-judged and Fallback Reasoning, to optimize Large Reasoning Models by reducing token usage while maintaining accuracy. Results show up to 26.6% fewer tokens with high accuracy, improving LRM efficiency.
Abstract: Large Reasoning Models (LRMs) exhibit potential in problem-solving through extended Chain-of-Thought (CoT) generation, enhancing robustness and accuracy by iteratively revising user prompts. However, excessive CoT generation poses challenges in LLM inference, as prolonged decoding due to redundant tokens creates computational bottlenecks. This paper introduces two training-free self-thinking methods—Pre-judged Reasoning and Fallback Reasoning—which optimize inference efficiency via dynamic selection of fast thinking and reasoning strategies based on LRMs’ intrinsic task complexity classification capabilities. Evaluations on the MATH500 and AIME24 dataset demonstrate that Pre-judged Reasoning reduces token generation by up to 26.6\% compared to slow reasoning without compromising accuracy. Similarly, Fallback Reasoning achieves a reduction of up to 24.0\% in generated tokens, enabling significantly faster task completion. Both methods substantially reduce computational overhead while retaining the accuracy of LRMs.
Submission Number: 100
Loading