MMFed: A Multimodal Federated Learning Framework for Heterogeneous Devices

Gang Wang, Yanfeng Zhang, Chenhao Ying, Qinnan Zhang, Zehui Xiong, Jiakang Wang, Ge Yu

Published: 2025, Last Modified: 02 Feb 2026IEEE Internet Things J. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing federated learning (FL) frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multimodal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal FL framework, MMFed, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our FL framework, MMFed, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.

External IDs:dblp:journals/iotj/WangZYZXWY25