Wait, Do We Really Need to "Wait"? Towards Training-Free Efficient Reasoning in R1-style Models

Wait, Do We Really Need to "Wait"? Towards Training-Free Efficient Reasoning in R1-style Models

ACL ARR 2025 May Submission2838 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27\%–51\% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality, Vision Question Answering, Cross-modal Application

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 2838

Loading