Keywords: LLM, Reasoning Model, Overthinking, Abstain
TL;DR: Reasoning models can’t think critically when premise is missing.
Abstract: We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking.
Such failures are against the ``test-time scaling law'' but have been widely observed on multiple datasets we curated with MiP, indicating the harm of cheap overthinking and a lack of critical thinking.
Surprisingly, LLMs not specifically trained for reasoning exhibit much better critical thinking ability, producing much shorter responses that quickly identify ill-posed queries and ask for the MiP. This implies a critical flaw of the current training recipe for reasoning LLMs, which does not encourage efficient thinking adequately, leading to the abuse of thinking patterns.
To further investigate the reasons behind such failures, we conduct fine-grained analyses of the reasoning length, overthinking patterns, and location of critical thinking on different types of LLMs.
Moreover, our extended ablation study reveals that the overthinking is contagious through the distillation of reasoning models' responses.
These results improve the understanding of overthinking and shed novel insights into mitigating the problem.
Our code and data can be found in: https://github.com/tianyi-lab/MiP-Overthinking.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1058
Loading