Keywords: self-supervised learning, domain shift
Abstract: Learning compact representations that preserve semantics while discarding nuisance variation is central to self-supervised learning (SSL).
However, when training data come from heterogeneous domains, instance-level contrastive learning often treats cross-domain yet semantically similar samples as false negatives and entangles domain cues with semantic features, yielding domain-clustered representations that generalize poorly to novel domains. To address this issue, we propose Structured Contrastive Learning (SCL).
This unified framework jointly learns (i) a semantic representation $\mathbf{z}_s$ via semantic contrast, (ii) a domain representation $\mathbf{z}_d$ via domain contrast, and (iii) their disentanglement by minimizing the dependence (mutual information) between $\mathbf{z}_s$ and $\mathbf{z}_d$. This structure preserves domain-invariant semantics in $\mathbf{z}_s$ while isolating domain factors in $\mathbf{z}_d$, enabling robust self-supervised training on data from a mixture of domains and out-of-domain(OOD) generalization on novel domains. Theoretically, we proved that the training objective of SCL is to extract semantic and domain information separately, and minimizing the mutual information between $\mathbf{z}_s$ and $\mathbf{z}_d$ can enhance the model’s generalization ability under domain shift. Empirically, we validate the performance of SCL on multi-domain training tasks and its generalization to novel domains through experiments on multiple datasets from multiple modalities.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 5018
Loading