Is the Glass Half-Empty or Half-Full? A Mixture-Of-Tasks Perspective on Missing Modality

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: missing modality, modality competition, multimodal learning, multimodal fusion
TL;DR: We propose the Missing Modality Performance Testbed (MMPT) to tackle issues with missing modality in multimodal learning setups by reconstructing missing modality robustness analysis as a fundamental part of multimodal representation learning.
Abstract: A common issue with multimodal learning setups is the unavailability of one or more modalities. Historically, missing modality has been treated as a matter of robustness, aiming to prevent performance degradation caused by stochastic loss of training and testing modalities. However, this perspective does not align with many scientific and industrial use cases of deep models where unimodal inputs are more common than having multiple modalities. Moreover, it poses practical challenges such as complicating comparisons between studies and causing ambiguity in understanding optimal model behavior. We instead propose a `glass-half-full' approach---the Missing Modality Performance Testbed (MMPT)--- which sheds light on the pivotal elements for enhancing model performance under the effect of missing modalities. MMPT reconceptualizes missing modality robustness analysis as a fundamental aspect of multimodal representation learning. This formulation allows us to connect missing modality to modality competition, an area of work that aims to improve unimodal representations in a multimodal context for late-fusion models. We create a unified framework for both missing modality and modality competition by relaxing their architectural assumptions. Via this linkage, we explore how current approaches to missing modality impact the underlying model representations and the requisite representations for favorable performance. We validate this novel perspective on a wide variety of multimodal datasets with the intention of enabling simple and clear benchmarking for future research. Finally, we present a new state-of-the-art in missing modality performance and identify potential areas for further improvement.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8627
Loading