$1+1<1$? Breaking the Standalone Barrier in Federated Fine-Tuning of Multimodal Large Language Models under Non-IID Data
Keywords: Multimodal Large Language Models, Federated Learning, Fine-Tuning, Non-IID Data
Abstract: Federated fine-tuning of multimodal large language models faces significant challenges in communication costs, which can be addressed by Low-Rank Adaptation (LoRA). Existing methods typically allow all clients to collaboratively learn and share a single LoRA adapter. However, we identify a long-overlooked issue: under non-IID data, federated fine-tuning can even underperform standalone local training ("$1+1<1$"). Strikingly, much of the literature still focuses on surpassing SOTA Federated Learning (FL) methods, while neglecting the more fundamental requirement that any effective FL approach should at least outperform standalone local training. To address this, we propose a novel method termed Federated Mixture of LoRA Experts (Fed-MoLE). It adopts a hybrid mixture-of-LoRA-experts architecture with an alternating disentanglement–alignment mechanism. This design enables the model to disentangle diverse instance-level variations through dynamically routed LoRA experts, and then align cross-client knowledge into a unified global representation, thus enhancing robustness under non-IID data. Extensive experiments on two benchmarks show that Fed-MoLE consistently surpasses both SOAT FL baselines and standalone local training, effectively breaking the "$1+1<1$" barrier in federated fine-tuning of multimodal large language models under non-IID data.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2955
Loading