Learning to Associate: Multimodal Inference with Fully Missing Modalities

Jack Geraghty, Andrew Hines, Fatemeh Golpayegani

Published: 2025, Last Modified: 06 May 2026ACM Trans. Intell. Syst. Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this article, we propose Cross-Modal Association Models (C-MAMs), a novel approach for handling missing modalities during inference in multimodal learning. Unlike existing methods that modify the training process, C-MAMs generate missing modality features post-training, preserving the integrity of the original multimodal model. In this article, we: (i) formalise the problem of missing modality inference and its challenges, (ii) introduce C-MAMs as a flexible, lightweight, post-hoc solution for reconstructing missing modality embeddings, (iii) evaluate their effectiveness across diverse datasets, tasks and baseline models, and (iv) analyse the quality of the generated versus the ground-truth features to quantify the reconstruction fidelity. Experimental results show that C-MAMs significantly mitigate performance degradation due to missing modalities, in some cases fully restoring baseline performance, even when trained on 10% of the data. We conclude that post-training feature reconstruction is an effective, targeted alternative to existing methods, with broad applicability in multimodal systems.
Loading