HGT-UCOD: A Hint-Guided Teacher Framework for Unsupervised Camouflaged Object Detection

08 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Camouflaged Object Detection
Abstract: Camouflaged Object Detection (COD) holds significant potential in various high-stakes applications, yet its progress is fundamentally bottlenecked by a heavy reliance on large-scale, pixel-level annotated data. While Unsupervised Domain Adaptation (UDA) offers a promising path forward, real-world scenarios often impose stricter constraints due to data privacy, leaving us with only a pre-trained source model—a more challenging setting known as source-free domain adaptation. A critical flaw in current methods is their direct use of the source model (e.g., one trained for salient object detection) to generate pseudo-labels. The inherent "saliency bias" of such models—an inclination to find objects that "stand out" rather than "blend in"—results in incomplete and noisy labels that only capture the most conspicuous parts of a target. Self-training on this flawed guidance inevitably falls into confirmation bias, amplifying initial errors and limiting performance. We introduce a paradigm shift in addressing this problem. Instead of treating the biased predictions as mere noise, we innovatively reframe their high-confidence fragments as reliable "hints". Based on this philosophy, we propose HGT-UCOD, a novel Hint-Guided Teacher framework designed to guide the model in inferring the complete object from these sparse yet trustworthy cues. The cornerstone of our framework is a unique teacher pre-adaptation stage. Here, we first cultivate an "expert teacher" by compelling it to learn to infer the full object from partial views containing only these "hints," thus building specialized knowledge. Subsequently, during student refinement, this expert teacher collaborates with the source model to generate high-quality pseudo-labels via a dynamic fusion strategy. This process is further enhanced by strong consistency regularization, which forces the student to learn robust, perturbation-invariant features. To empower this inference, both our teacher and student models are equipped with a novel Dynamic Convolution Mixture (DCM) module, which adaptively generates content-aware kernels to capture the subtle, context-dependent features of camouflaged objects. Extensive experiments on multiple benchmark datasets demonstrate that our method achieves superior performance, establishing a new state-of-the-art for source-free unsupervised COD.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 3003
Loading