Optimal and Generalizable Multimodal Representation Learning Framework through Adaptive Graph Construction

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: contrastive learning, multimodal representation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Multimodal contrastive learning train neural networks by levergaing ata from heterogenous sources such as images and text. Yet, current multimodal learning architectures cannot generalize to an arbitrary number of modalities, need to be hand-constructed and are often not robust to missing modalities. We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalities. AutoBIND uses a graph-based approach to automatically select the most correlated modalities and uses a contrastive loss to learn the representations. AutoBIND is robust to missing modalities as it can dynamically update the graph structure during training. Based on the proposed framework, each modality maps to a shared embedding space and that the correlation between two modalities can be used as a measure of similarity between the two modalities. Therefore, the graph structure improves dynamically during training, purely as a result of the minimization of the contrastive loss. We evaluate AutoBIND on a wide variety of datasets, including tasks such as Alzhiemer's disease detection and house price prediction, and across a broad range of data modalities: 3D images, 2D images, and tables. We also show that AutoBIND outperforms previous methods on these tasks, highlighting the generalizablility of the approach.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 996
Loading