Revisiting Sparse Mixture of Experts for Resource-adaptive Federated Fine-tuning Foundation Models

Van-Tuan Tran; Le Huy Khiem; Quoc-Viet Pham

Revisiting Sparse Mixture of Experts for Resource-adaptive Federated Fine-tuning Foundation Models

Van-Tuan Tran, Le Huy Khiem, Quoc-Viet Pham

Published: 06 Mar 2025, Last Modified: 04 Apr 2025MCDC @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Resource-adaptive, Mixture-of-Experts

Abstract: Existing federated fine-tuning methods for large-scale foundation models (FMs) assign heterogeneous low-rank adaptation (LoRA) ranks for clients based on their computation capabilities to address system heterogeneity. However, these approaches require merging LoRA matrices into the original model to obtain the full model, causing the computational overhead for resource-constrained clients at inference time. Moreover, their performance is not as effective as that of the homogeneous LoRA, in which the lowest rank is applied to all clients. To overcome these limitations, we propose a resource-adaptive federated fine-tuning method by revisiting the conditional computation property of Sparsely-activated Mixture-of-Experts (SMoE). The key principle here is to extend the data-conditional computation property of SMoE to a new dimension - resource-conditional computation, where clients can activate a suitable number of experts depending on their available resources. Furthermore, to address the imbalanced expert utilization caused by heterogeneous expert activation patterns, we propose a new Activation-aware aggregation algorithm for SMoE (A3SMoE). This algorithm enhances the aggregation process by incorporating client-specific expert activation patterns. Through experiments across independent and identically distributed (IID) and non-IID scenarios, we demonstrate that our proposed method achieves superior performance compared to both homogeneous- and heterogeneous-LoRA approaches under different computation budgets. We also show that LoRA-based methods can be improved when integrated with A3SMoE.

Submission Number: 24

Loading