Deep Thinking on Out-Of-Distribution Data: How can we know when a model is overthinking?

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Overthinking, Test Time Scaling, Distribution Shift, Self-Supervised Learning
TL;DR: In this paper, we explore the extrapolation capacity of Deep Thinking Models on Object Recognition Task, and propose a novel method using self supervise learning to detect overthinking of the models on Out-of-Distribution Shift.
Abstract: Deep thinking models, a class of recurrent architectures, can generalize from easy to hard examples by allocating more computation during inference. While effective in logical reasoning tasks, their potential for test-time adaptation in computer vision under out-of-distribution (OOD) data remains underexplored. This work investigates deep thinking as a test-time scaling strategy for object recognition under distributed shift settings. We show that while thinking longer can improve performance, it also introduces the risk of overthinking, where excessive computation damages accuracy. To address this, we propose a self-supervised proxy task that dynamically detects overthinking and approximates the peak accuracy without requiring ground-truth labels. Across multiple OOD object-recognition benchmarks, deep thinking with our proxy delivers performance gains and accuracy close to peak while avoiding overthinking-related drops.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 6684
Loading