Learning Multi-Modal Representation Alignments from Noisy Data-Pairs

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: multi-modal learning, contrastive learning, foundation models
TL;DR: we propose a novel solution by reformulating the standard CL into a probability framework, and introducing learnable random weights to associate with data pairs, so as to allow automatic inference of the degree of noisiness for each data pair.
Abstract: Contrastive learning~(CL) represents one of the most successful paradigms for self-supervised representation learning, which has been applied to SOTA multi-modal learning applications. One overlooked limitation of standard contrastive learning, however, is that it is not designed for robust learning in the presence of noisy data pairs. For example, not all negative samples are truly negative, {\it e.g.}, within a mini-batch there can be negative samples that are semantically as positive as the positive sample. This is common in most web-sourced multi-modal datasets such as CC3M and YFCC that are frequently used for CL, due to the noisy nature when crawling the datasets. Consequently, the noise in the datasets could significantly impair the power of CL. To remedy this issue, we propose a novel solution by reformulating the standard CL into a probability framework, and introducing learnable random weights to associate with data pairs, so as to allow automatic inference of the degree of noisiness for each data pair. Within our probability framework, posterior inference of the random weights can be done efficiently with Bayesian data augmentation. Consequently, the model can be effectively optimized by a novel learning algorithm based on stochastic expectation maximization. We demonstrate the effectiveness of our approach on several standard multi-modal contrastive learning benchmarks, which significantly outperforms standard contrastive learning.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2836
Loading