Incomplete Multimodal Federated Learning via Masking and Contrasting Prototypes

Guangyin Bao, Qi Zhang, Duoqian Miao, Zixuan Gong, Chaochao Chen, Liang Hu, Longbing Cao

Published: 01 Jan 2026, Last Modified: 28 Apr 2026IEEE Transactions on Neural Networks and Learning SystemsEveryoneRevisionsCC BY-SA 4.0

Abstract: In real-world scenarios, random modality missingness in multimodal federated learning (mFL) poses a significant challenge, diminishing the performance of global model inference. However, existing mFL methods are predominantly limited to simple scenarios that typically involve participant clients restricted to either a single modality or multimodal clients with complete modalities. They employ modality-specific encoders on each client and train modality fusion modules on the server, leading to severe task drift between clients and server, and struggling to generalize effectively in intricate modality-missing scenarios. To this end, we present a novel mFL framework to alleviate the task drift and performance degradation resulting from modality missingness during both training and inference. Inspired by prototype learning using the highly generalized proxy of specific information, we elaborately construct a prototype library to enhance FedAvg-based federated learning (FL). Naturally, we utilize prototypes as masks representing missing modalities to compensate for the missingness of modality information, formulating a task-calibrated training loss and devising a model-agnostic modality-incomplete inference strategy. In addition, a proximal term based on prototype contrastive learning is constructed to integrate interclient global information into each client, therefore enhancing local training. We conduct extensive experiments to evaluate our mFL framework, demonstrating its state-of-the-art performance across a series of missingness settings. Specifically, compared with existing mFL methods, our mFL framework improves inference performance under different modality missingness rates during training and by 23.8% during modality-incomplete inference.

External IDs:doi:10.1109/tnnls.2026.3658522