Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?

Shouren Wang; Van Yang; Xianxuan Long; Qifan Wang; Vipin Chaudhary; Xiaotian Han

Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?

Shouren Wang, Van Yang, Xianxuan Long, Qifan Wang, Vipin Chaudhary, Xiaotian Han

15 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Reasoning, Hybrid Thinking

TL;DR: An empirical study of hybrid thinking identifies key training factors and introduces a recipe that improves controllability.

Abstract: Hybrid thinking enables LLMs to switch between reasoning and direct answering, offering a balance between efficiency and reasoning capability. Yet our experiments reveal that current hybrid thinking LLMs only achieve partial mode separation: reasoning behaviors often leak into the no-think mode. To understand and mitigate this, we analyze the factors influencing controllability and identify four that matter most: (1) larger data scale, (2) using think and no-think answers from different questions rather than the same question, (3) a moderate increase in no-think data number, and (4) a two-phase strategy that first trains reasoning ability and then applies hybrid think training. Building on these findings, we propose a practical recipe that, compared to standard training, can maintain accuracy in both modes while significantly reducing no-think output length (from 1085 to 585 on MATH500) and occurrences of reasoning-supportive tokens such as "wait" (from 5917 to 522 on MATH500). Our findings highlight the limitations of current hybrid thinking and point the ways for enhancing its controllability.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 5920

Loading