Keywords: multimodal learning, contrastive learning, multi-task learning
TL;DR: This paper proposes a novel multi-task-based multimodal contrastive method for multimodal representation learning (multimodal classification task).
Abstract: Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for final decisions.
However, just like a human's final decision can be confused by specific erroneous information from the environment, current multimodal learning methods also suffer from uncertain unimodal prediction when learning multimodal representations. In this work, we propose to contrastively explore reliable representations and increase the agreement among the unimodal representations that alone make potentially correct predictions.
Specifically, we first capture task-related representations by directly sharing representations between unimodal and multimodal learning tasks. With the unimodal representations and predictions from the multitask-based framework, we then propose a novel multimodal contrastive learning method to align the representations towards the relatively more reliable modality under the weak supervision of the unimodal predictions.
Experimental results on two image-text benchmarks UPMC-Food-101 and N24News, and two medical benchmarks ROSMAP and BRCA, show that our proposed Unimodality-supervised Multimodal Contrastive (UniS-MMC) learning method outperforms current state-of-the-art multimodal learning methods. The detailed ablation studies further demonstrate the advantage of our proposed method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
9 Replies
Loading