UniS-MMC: Learning Unimodality-supervised Multimodal Contrastive RepresentationsDownload PDF

22 Sept 2022 (modified: 06 May 2024)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: multimodal learning, contrastive learning, multi-task learning
TL;DR: This paper proposes a novel multi-task-based multimodal contrastive method for multimodal representation learning (multimodal classification task).
Abstract: Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for final decisions. However, just like a human's final decision can be confused by specific erroneous information from the environment, current multimodal learning methods also suffer from uncertain unimodal prediction when learning multimodal representations. In this work, we propose to contrastively explore reliable representations and increase the agreement among the unimodal representations that alone make potentially correct predictions. Specifically, we first capture task-related representations by directly sharing representations between unimodal and multimodal learning tasks. With the unimodal representations and predictions from the multitask-based framework, we then propose a novel multimodal contrastive learning method to align the representations towards the relatively more reliable modality under the weak supervision of the unimodal predictions. Experimental results on two image-text benchmarks UPMC-Food-101 and N24News, and two medical benchmarks ROSMAP and BRCA, show that our proposed Unimodality-supervised Multimodal Contrastive (UniS-MMC) learning method outperforms current state-of-the-art multimodal learning methods. The detailed ablation studies further demonstrate the advantage of our proposed method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
9 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview