Cross-Modal Meta Consensus for Heterogeneous Federated Learning

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the evolving landscape of federated learning (FL), the integration of multimodal data presents both unprecedented opportunities and significant challenges. Existing work falls short of meeting the growing demand for systems that can efficiently handle diverse tasks and modalities in rapidly changing environments. We propose a meta-learning strategy tailored for Multimodal Federated Learning (MFL) in a multitask setting, which harmonizes intra-modal and inter-modal feature spaces through the Cross-Modal Meta Consensus. This innovative approach enables seamless integration and transfer of knowledge across different data types, enhancing task personalization within modalities and facilitating effective cross-modality knowledge sharing. Additionally, we introduce Gradient Consistency-based Clustering for multimodal convergence, specifically designed to resolve conflicts at meta-initialization points arising from diverse modality distributions, supported by theoretical guarantees. Our approach, evaluated as $M^{3}Fed$ on five federated datasets, with at most four modalities and four downstream tasks, demonstrates strong performance across diverse data distributions, affirming its effectiveness in multimodal federated learning. The code is available at https://anonymous.4open.science/r/M3Fed-44DB.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Engagement] Emotional and Social Signals, [Content] Media Interpretation
Relevance To Conference: This article makes a significant contribution to the multimodal domain by introducing a novel multimodal multitask federated learning framework based on meta-learning methods. The innovation of this framework lies in its ability to effectively utilize information across different modalities and tasks to train and optimize models on multimodal data. Compared to traditional methods, this framework significantly improves model training efficiency and generalization while preserving user privacy. The introduction of this method provides a fresh perspective for handling and utilizing multimodal data, particularly in cross-modal tasks where distributed privacy security is involved, showing tremendous application prospects. Through practical validation, this innovative approach has demonstrated significant effectiveness in multimodal data analysis, injecting new vitality and impetus into research and development in the multimodal domain.
Supplementary Material: zip
Submission Number: 4420
Loading