CLIP as a Prior Teacher: Breaking the Label Dependency in Semi-Supervised Learning

CLIP as a Prior Teacher: Breaking the Label Dependency in Semi-Supervised Learning

ICLR 2026 Conference Submission18046 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: semi-supervised learning, CLIP, co-training, adapter-tuning, low-label regimes, vision-language model

Abstract: Semi-supervised learning (SSL) has shown remarkable potential in scenarios with limited labeled data. However, our study reveals that existing SSL approaches remain inherently label-dependent—their ability to exploit unlabeled samples is bounded by the quantity and quality of labeled data. To address this limitation, we establish a portable asymmetric-modalities co-training framework for efficiently integrating CLIP into SSL, termed CaPT. CaPT aggregates predictions from a fully fine-tuned unimodal network and a parameter-efficiently fine-tuned multimodal CLIP model via carefully designed co-pseudo labels, which guide training by refining CLIP’s biased predictions and supplementing reliable prior for SSL without compromising efficiency. Moreover, the asymmetric-modalities mitigates the pattern-homogeneity bottleneck observed in previous co-training methods, enabling richer cross-model information exchange. CaPT consistently achieves state-of-the-art performance across multiple SSL benchmarks. Notably, it outperforms the second-best method by **21.38%** and **4.05%** on the CIFAR-100 and EuroSAT datasets, respectively, under the one-label-per-class setting, demonstrating its strong potential in low-label regimes.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 18046

Loading