Cross-Modal Alignment via Variational Copula Modelling

Feng Wu; Tsai Hor Chan; Fuying Wang; Guosheng Yin; Lequan Yu

Cross-Modal Alignment via Variational Copula Modelling

Feng Wu, Tsai Hor Chan, Fuying Wang, Guosheng Yin, Lequan Yu

24 Sept 2024 (modified: 24 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Copula, Multimodal learning, Missing modality

TL;DR: We develop a copula variational inference framework for cross-model alignment

Abstract: Various data modalities are common in real-world applications. In healthcare, for example, electronic health records, medical images, and clinical notes provide comprehensive information for diagnosis and treatment. Thus, it is essential to develop multimodal learning methods that aggregate information from multiple modalities to generate meaningful representations for downstream tasks. The key challenge here is how to appropriately align the representations of the respective modalities and fuse them into a joint distribution. Existing methods mainly focus on fusing the representations via concatenation or the Kronecker product, which oversimplifies the interaction structure between modalities, prompting the need to model more complex interactions. Moreover, the notion of joint distribution of the latent representation that incorporates higher-order interactions between modalities is also underexplored. Copula is a powerful statistical structure in modelling the interactions between variables, as it bridges the joint distribution and marginal distributions of multiple variables. In this paper, we propose a novel copula modelling-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interaction among them. The key idea is interpreting the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can also generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. Ablation studies also validate the effectiveness of the copula alignment strategy and the robustness of our model over different choices of the copula family. Code is anonymously available at https://anonymous.4open.science/r/CM2-C1FD/README.md.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3579

Loading