The Modality Gap in Multimodal Semantic Communication

Published: 26 Jan 2026, Last Modified: 26 Jan 2026AAAI 2026 Workshop on ML4Wireless PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Learning, Semantic Communication, Modality Gap
Abstract: Multimodal semantic communication (SemCom) is a key paradigm for 6G, yet it faces a critical bandwidth bottleneck. Conventional systems transmit a separate latent vector for each of the $M$ modalities, an approach that scales poorly. In this paper, we argue that this inefficiency is a direct consequence of the modality gap, the persistent structural misalignment in contrastively-trained latent spaces that prevents a single, unified representation of multiple modalities. This paper presents a theoretical framework arguing that reducing the modality gap is the key enabler for efficient multimodal compression. By employing gap-reduction techniques, the $M$ modality-specific embeddings for a single semantic concept collapse into a unified cluster. This alignment enables a novel transmission strategy: sending a single semantic centroid to represent the entire semantic concept, achieving a direct $1/M$ bandwidth reduction. We demonstrate the effectiveness of the proposed solution in different downstream tasks such as classification and reconstruction. This work establishes a clear path toward truly scalable and efficient multimodal semantic communication systems.
Submission Number: 19
Loading