Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated Learning, LoRA, Fine-Tuning, Efficiency
TL;DR: We propose a parameter-efficient fine-tuning method that improves the performance of LoRA in the presence of computational and data heterogeneity in FL.
Abstract: Large Language Models (LLMs) have yet to effectively leverage the vast amounts of edge-device data, and Federated Learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computational and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose Ravan, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads, $s_i\textbf{B}_i\textbf{H}_i\textbf{A}_i$, in which only the $\textbf{H}_i$ parameters and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i\textbf{H}_i$ directly. Experiments on vision and language benchmarks show that Ravan improves test accuracy by 2–8\% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 24081
Loading