Heterogeneous LoRA for Federated Fine-tuning of On-device Foundation Models

Yae Jee Cho; Luyang Liu; Zheng Xu; Aldi Fahrezi; Matt Barnes; Gauri Joshi

Heterogeneous LoRA for Federated Fine-tuning of On-device Foundation Models

Yae Jee Cho, Luyang Liu, Zheng Xu, Aldi Fahrezi, Matt Barnes, Gauri Joshi

Published: 28 Oct 2023, Last Modified: 10 Dec 2023FL@FM-NeurIPS’23 PosterEveryoneRevisionsBibTeX

Student Author Indication: Yes

Keywords: On-device Foundation Models, Federated Fine-Tuning, Low Rank Approximation

Abstract: Foundation models (FMs) in massive parameter space pretrained on a large amount of (public) data perform remarkably well on various downstream tasks with just a few samples for fine-tuning. However, direct fine-tuning of the standard FMs often becomes difficult due to their massive size, especially for scenarios where FMs are adapted on private data distributed across resource-limited devices. As such, only those FMs with relatively small parameter size may be capable of on-device fine-tuning. We call these smaller FMs as *on-device FMs (ODFMs)*. In our work, we investigate parameter-efficient federated fine-tuning of ODFMs (XXS PaLM2) for downstream tasks on devices using low-rank approximations (LoRAs) for potential downstream tasks of devices, where we investigate multi-session chat data from real clients as the downstream task of interest. We first examine federated fine-tuning with homogeneous LoRA ranks across clients, and show that higher ranks can lead to overfitting despite their faster learning speed whilst lower ranks do not overfit but converge slower in training. Based on these observations, we propose heterogeneous LoRA, where we deploy *hetergeneous ranks* across clients, aggregate the heterogeneous LoRA modules through zero-padding, and redistribute the LoRA modules heterogeneously through truncation. Our proposed heterogeneous LoRA is simple yet effective. It achieves the best of both worlds by combining the advantages of high-rank and low-rank LoRAs. This allows us to achieve the best performance with the fewest number of communication rounds, while also avoiding the problem of overfitting.

Submission Number: 46

Loading