Gazelle: A Multimodal Learning System Robust to Missing Modalities

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: multimodal, multimodal classification, missing modality
Abstract: Typical multimodal classification systems exhibit deteriorated performance if one or more modalities are missing at test time. In this work, we propose a robust multimodal classification system, Gazelle, which is less susceptible to missing modalities. It consists of a single-branch network sharing weights across multiple modalities to learn intermodal representations. It introduces a novel training scheme featuring a modality switch mechanism over input embeddings extracted using modality-specific networks to maximise performance as well as robustness to missing modalities. Extensive experiments are performed on four challenging datasets including textual-visual (UPMC Food-$101$, Hateful Memes, Ferramenta) and audio-visual modalities (VoxCeleb$1$). Gazelle achieved superior performance when all modalities are present as well as in the case of missing modalities compared to the existing state-of-the-art methods.
Supplementary Material: pdf
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3373
Loading