Abstract: In this article, we propose Cross-Modal Association Models (C-MAMs), a novel approach for handling missing modalities during inference in multimodal learning. Unlike existing methods that modify the training process, C-MAMs generate missing modality features post-training, preserving the integrity of the original multimodal model. In this article, we: (i) formalise the problem of missing modality inference and its challenges, (ii) introduce C-MAMs as a flexible, lightweight, post-hoc solution for reconstructing missing modality embeddings, (iii) evaluate their effectiveness across diverse datasets, tasks and baseline models, and (iv) analyse the quality of the generated versus the ground-truth features to quantify the reconstruction fidelity. Experimental results show that C-MAMs significantly mitigate performance degradation due to missing modalities, in some cases fully restoring baseline performance, even when trained on 10% of the data. We conclude that post-training feature reconstruction is an effective, targeted alternative to existing methods, with broad applicability in multimodal systems.
External IDs:dblp:journals/tist/GeraghtyHG25
Loading