ARS Adaptive Reasoning Suppression for Efficient Large Reasoning Language Model

ICLR 2026 Conference Submission22226 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Reasoning Language Models; LRMs; Efficient Reasoning; On-Device; Energy efficient; Token Efficient
TL;DR: We propose Adaptive Reasoning Suppression (ARS), a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring.
Abstract: Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53\%, 46.1\%, and 57.9\% in token, latency and energy reduction, while maintaining or improving accuracy.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22226
Loading