Keywords: Federated Learning, Federated Continual Learning, Prompt learning, Vision-Language model
Abstract: Pretrained vision-language models (VLMs), such as CLIP, have shown promise in federated learning (FL) by bringing strong multimodal representations to edge devices. However, continual adaptation remains a core challenge in practical federated settings, where task distributions evolve over time and data remain non-IID across clients. In this emerging area, recent works adopt parameter-efficient fine-tuning (PEFT) as a lightweight way to reduce communication overhead, yet they fail to preserve satisfactory performance under continual learning conditions. Meanwhile, traditional federated continual learning (FCL) methods lack the capacity to maintain cross-modal alignment crucial to VLM performance. We introduce Fed-Duet, a novel Dual-channel Expert-orchestrated framework for efficient federated continual learning in vision-language models. Fed-Duet features a dual-channel adaptation mechanism, combining server-coordinated semantic prompts with client-personalized modular adapters. These channels are dynamically fused via a cross-attention mechanism, enabling effective knowledge transfer while preserving multimodal alignment and mitigating forgetting. We evaluate Fed-Duet across multiple challenging continual learning tasks in federated vision-language settings and demonstrate that it achieves superior performance and stability compared to existing approaches. Our work highlights the importance of coordinated expert composition in enabling scalable and robust multimodal continual learning. The code is available at https://anonymous.4open.science/r/FedDuet-0426/.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 12707
Loading