Looking beyond the surface with Contrastive LEarning with Anti-contrastive Regularization (CLEAR)

25 Sept 2024 (modified: 18 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Weakly Supervised Learning, Disentangled Representation Learning, Variational Autoencoder, Contrastive Learning
TL;DR: We propose a weakly supervised framework based on Contrastive LEarning with Anti-contrastive Regularization (CLEAR) to effectively disentangle and recognize $content$ and $style$ in the latent space.
Abstract: Learning representations that are robust to superficial sources of variability is important to ensure such variability does not impact downstream tasks. For instance, in healthcare applications, we might like to learn features that are useful for identifying pathology, yet have similar distributions across diverse demographic groups, leading to more accurate and equitable diagnoses regardless of background or surface characteristics. More broadly, this capability can improve the generalizability of our representations by mitigating unwanted effects of variability not seen during training. In this work, we suppose that data representations can be semantically separated into two components: $content$ and $style$. The $content$ consists of information needed for downstream tasks -- for example, it is predictive of the class label in a downstream classification problem -- whereas the $style$ consists of attributes that are superficial in the sense that they are irrelevant to downstream tasks, yet may compromise performance due to associations observed in training data that do not generalize. Here we propose a weakly supervised framework, Contrastive LEarning with Anti-contrastive Regularization (CLEAR), to effectively disentangle $content$ and $style$ in the latent space of a Variational Autoencoder (VAE). Our anti-contrastive penalty, which we call Pair Switching (PS), uses a novel label flipping approach to ensure content is recognized effectively and limited to the $content$ features. We perform experiments to quantitatively and qualitatively evaluate CLEAR-VAE across distinct data modalities. We then analyze the trade-off between disentanglement and ELBO, and the impact of various hyperparameters within our framework. Our results show that using disentangled representations from CLEAR-VAE, we can: (a) swap and interpolate $content$ and $style$ between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of $content$ and $style$.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4033
Loading