Probabilistic Multimodal Representation Learning

Leila Pishdad; Ran Zhang; Afsaneh Fazly; Allan Jepson

Probabilistic Multimodal Representation Learning

Leila Pishdad, Ran Zhang, Afsaneh Fazly, Allan Jepson

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: multimodal representation learning, Probabilistic representation, image caption retrieval

Abstract: Learning multimodal representations is a requirement for many tasks such as image--caption retrieval. Previous work on this problem has only focused on finding good vector representations without any explicit measure of uncertainty. In this work, we argue and demonstrate that learning multimodal representations as probability distributions can lead to better representations, as well as providing other benefits such as adding a measure of uncertainty to the learned representations. We show that this measure of uncertainty can capture how confident our model is about the representations in the multimodal domain, i.e, how clear it is for the model to retrieve/predict the matching pair. We experiment with similarity metrics that have not been traditionally used for the multimodal retrieval task, and show that the choice of the similarity metric affects the quality of the learned representations.

One-sentence Summary: Proposing probabilistic multimodal representations for multimodal scenarios

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=ZFnXz1Sqx9

9 Replies

Loading