Keywords: Multi-View Representation Learning, Multimodal Representation Learning, Contrastive Learning
Abstract: Multi-View Representation Learning (MVRL) aims to learn the joint representation from diverse data sources by discovering complex relationships among them.
In MVRL, since the downstream task information and the view availability are often unknown a-priori, it is essential for the joint representation to be robust to the partial availability of views.
However, existing methods exhibit various limitations, such as discarding potentially valuable view-specific information, lacking
the ability to extract representation from an arbitrary subset of views, or requiring considerable computational resources that increase exponentially with the number of views.
To address these challenges, we present a scalable MVRL framework based on contrastive learning.
Our approach employs a set of encoders that is able to extract representations from arbitrary subset of views, and jointly trains them with a computation cost that scales linearly with the number of views.
We conducted comprehensive evaluations across 7 MVRL benchmark datasets ranging from 2 to 8 views, demonstrating that our method robustly handles diverse input view combinations and outperforms strong baseline methods.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10222
Loading