Heterogeneous LoRA for Federated Fine-tuning of On-device Foundation Models
Keywords: On-device Foundation Models, Federated Fine-Tuning, Low Rank Approximation
Abstract: Foundation models (FMs) in massive parameter space pretrained on a large amount of (public) data perform remarkably well on various downstream tasks with just a few samples for fine-tuning. However, direct fine-tuning of the standard FMs often becomes difficult due to their massive size, especially for scenarios where FMs are adapted on private data distributed across resource-limited devices. As such, only those FMs with relatively small parameter size may be capable of on-device fine-tuning. We call these smaller FMs as *on-device FMs (ODFMs)*. In our work, we investigate parameter-efficient federated fine-tuning of ODFMs (XXS PaLM2) for downstream tasks on devices using low-rank approximations (LoRAs) for potential downstream tasks of devices, where we investigate multi-session chat data from real clients as the downstream task of interest. We first examine federated fine-tuning with homogeneous LoRA ranks across clients, and show that higher ranks can lead to overfitting despite their faster learning speed whilst lower ranks do not overfit but converge slower in training. Based on these observations, we propose heterogeneous LoRA, where we deploy *hetergeneous ranks* across clients, aggregate the heterogeneous LoRA modules through zero-padding, and redistribute the LoRA modules heterogeneously through truncation. Our proposed heterogeneous LoRA is simple yet effective. It achieves the best of both worlds by combining the advantages of high-rank and low-rank LoRAs. This allows us to achieve the best performance with the fewest number of communication rounds, while also avoiding the problem of overfitting.
Student Author Indication: Yes
Submission Number: 46