UniS-MMC: Learning Unimodality-supervised Multimodal Contrastive Representations

Heqing Zou; Meng Shen; Chen Chen; Yuchen Hu; Deepu Rajan; EngSiong Chng

UniS-MMC: Learning Unimodality-supervised Multimodal Contrastive Representations

Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, EngSiong Chng

22 Sept 2022 (modified: 06 May 2024)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: multimodal learning, contrastive learning, multi-task learning

TL;DR: This paper proposes a novel multi-task-based multimodal contrastive method for multimodal representation learning (multimodal classification task).

Abstract: Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for final decisions. However, just like a human's final decision can be confused by specific erroneous information from the environment, current multimodal learning methods also suffer from uncertain unimodal prediction when learning multimodal representations. In this work, we propose to contrastively explore reliable representations and increase the agreement among the unimodal representations that alone make potentially correct predictions. Specifically, we first capture task-related representations by directly sharing representations between unimodal and multimodal learning tasks. With the unimodal representations and predictions from the multitask-based framework, we then propose a novel multimodal contrastive learning method to align the representations towards the relatively more reliable modality under the weak supervision of the unimodal predictions. Experimental results on two image-text benchmarks UPMC-Food-101 and N24News, and two medical benchmarks ROSMAP and BRCA, show that our proposed Unimodality-supervised Multimodal Contrastive (UniS-MMC) learning method outperforms current state-of-the-art multimodal learning methods. The detailed ablation studies further demonstrate the advantage of our proposed method.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

9 Replies

Loading