Reliable Multimodal Semantic Communication for Audio-Visual Event Localization

Yuandi Li, Zhe Xiang, Fei Yu, Zhuoran Zhang, Yanhao Wang, Zhangshuang Guan, Hui Ji, Zhiguo Wan

Published: 2026, Last Modified: 25 Jan 2026IEEE Commun. Lett. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The widespread adoption of smart mobile devices and applications has driven an exponential growth in wireless data traffic, posing significant challenges to modern communication systems. Ensuring reliable task-oriented multimodal semantic communication has become increasingly critical. In this letter, we propose RMMSC, a novel framework designed to enhance the effectiveness and reliability of Audio-Visual Event (AVE) localization-driven multimodal semantic communication. Specifically, RMMSC improves the accuracy of multimodal semantic information through advanced semantic encoding and cross-modal feature integration. It employs a two-level coding scheme that combines error-correcting codes with semantic encoders to enhance the reliability of multimodal semantic transmission. As an optional design choice, RMMSC supports a hybrid encryption mechanism to protect transmitted data if required by the application context. Simulation results validate the effectiveness of RMMSC, demonstrating significant improvements in accuracy and reliability for the AVE task.

External IDs:dblp:journals/icl/LiXYZWGJW26