Self-Supervised Learning with Side Information

ICLR 2026 Conference Submission17268 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-supervised Learning, MultiView Learning, Joint Embedding Architectures, Side Information, Endoscopic Image Analysis
TL;DR: Identifies a failure mode of the MultiView assumption in SSL and proposes an information-theoretic framework using otherwise redundant side information. Improves generalization in both controlled settings and real-world tasks.
Abstract: A core assumption behind many successful self-supervised learning (SSL) methods is that different views of the same input share the information needed for downstream tasks. However, this MultiView assumption can be overly permissive in real-world settings, where task-irrelevant features may persist across views and become entangled with useful signals. Motivated by challenges in colonoscopy—where polyp cues must be isolated from dominant but irrelevant background textures—we present an information-theoretic analysis of this general failure mode in SSL. We further formalize this with our proposed Nuisance-Free MultiView (NF-MV) assumption, which reframes the goal of SSL as learning representations that are sufficient for task-relevant information while being invariant to shared nuisance structure. We theoretically show that such representations yield improved generalization, and derive an idealized objective that balances standard view alignment with a mutual information penalty on nuisance content. To implement this in practice, we introduce a method that leverages side information—auxiliary data that shares nuisance structure but does not contain any task-relevant signals. The nuisance penalty is then approximated using a Jensen-Shannon divergence between main and side representations, in a way that is tractable and compatible with standard joint embedding architectures. Experiments on synthetic tasks with spurious correlations and on real-world colonoscopy datasets demonstrate that the proposed method improves generalization for a wide range of SSL methods and architectures by learning the relevant features. These findings highlight the benefits of explicitly modelling what should not be preserved during self-supervised learning, offering a new and practical perspective on the MultiView framework.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 17268
Loading