Client-Aware Multimodal Distillation with Adaptive Aggregation for Robust Federated Learning in Noisy and Adversarial Environments

Client-Aware Multimodal Distillation with Adaptive Aggregation for Robust Federated Learning in Noisy and Adversarial Environments

ICLR 2026 Conference Submission19315 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Knowledge Distillation, Multimodal Representation Learning, ; Adversarial Robustness, Adaptive Aggregation

TL;DR: We propose a client-aware multimodal distillation method with adaptive aggregation that combines adversarial training and semantic alignment to enable robust federated learning under noisy, non-IID conditions.

Abstract: Federated learning (FL) faces critical challenges in real-world deployments due to data heterogeneity, label noise, and susceptibility to adversarial inputs. Conventional distillation-based aggregation methods often assume uniform reliability among clients, overlooking disparities in representation quality and semantic alignment. In this work, we propose a client-aware multimodal distillation framework to enhance the robustness and semantic alignment of learned representations in FL systems. Our approach integrates a lightweight MobileNetV3 vision encoder with a CLIP-based textual prompt encoder, promoting cross-modal consistency through joint supervision. To improve resilience, each client performs adversarial training with gradient-based perturbations, enhancing the robustness of the model against input manipulations. At the core of our framework is the Client-Aware Attention Aggregation (CAAA) module, which dynamically adjusts client contributions based on cosine similarity of intermediate features and causal attribution gradients. This dual-guided weighting strategy enables the student model to selectively incorporate information from semantically consistent and informative clients while suppressing unreliable updates. We evaluated the proposed method on the various benchmark datasets under IID partitioning with adversarial and noisy conditions. The experimental results demonstrate consistent gains in precision and robustness across a variety of distillation strategies and adaptive aggregation methods, highlighting the effectiveness of our framework for trustworthy federated learning.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 19315

Loading