Learning Multi-modal Representations Under Incomplete Data Via Dual Level Alignments

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Representation Learning, Multi-modal, Incomplete Data, Multi-Level Alignments
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Our motivation is to learn modality-free representations of a wide variaty of enity types that can be appied to multi-modak tasks under incomletet data, and propose a learning framework.
Abstract: Our goal is to learn modality-free representations of a wide variety of entity types (e.g., text, image, object), that can be applied to multi-modal tasks under incomplete data (e.g., noisy data or missing modality information). While conventional methods train models over modality-specific features, (e.g., image features via visual encoding), and decode them into their contextual representations of specific modalities (e.g., images and text), our framework, {\it Multiple2Vec} (Mul2vec), is based on the idea that these features and the corresponding text are different views of the same entity, and learns semantic representations without directly using modality-specific features. Mul2vec is a framework consisting of NTF, and training objectives, DLM and ILM. Since this idea implies that similar entities have similar representations even on a dual-level (contextual and semantic), Mul2vec aligns them and optimizes the semantic representations with the corresponding contextual representations. Experiments show that Mul2vec learns semantic representations, and contributes to pre-trained models for downstream tasks under incomplete data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5538
Loading