Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation for Federated Learning

Published: 10 Jun 2025, Last Modified: 01 Jul 2025TTODLer-FM @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Federated Learning, Low-Rank Adaptation, Non-convex Optimization
TL;DR: We present RAC-LoRA, a low-rank optimization framework with provable guarantees of convergence to the same solution of full-parameter fine-tuning and we adapt it to Federated Learning
Abstract: Fine-tuning has become a popular approach for adapting large foundational models to specific downstream tasks. With the growing model sizes and dataset scales, parameter-efficient fine-tuning techniques are gaining importance for practical applications such as Federated Learning (FL). One of the most widely used parameter-efficient finetuning methods is Low-Rank Adaptation (LoRA), where the adaptation update is expressed as the product of two low-rank matrices. While LoRA possesses strong performance in fine-tuning, it often under-performs when compared to fullparameter fine-tuning (FPFT). Although many variants of LoRA have been extensively studied empirically, their theoretical optimization analysis is heavily under-explored. The starting point of our work is a demonstration that LoRA and its two extensions, Asymmetric LoRA and Chain of LoRA, indeed encounter convergence issues. To address these issues, we propose Randomized Asymmetric Chain of LoRA (RAC-LoRA)-a general optimization framework that rigorously analyzes the convergence rates of LoRA-based methods. Our approach inherits the empirical benefits of LoRA-style heuristics, but introduces several important algorithmic modifications, which results in a provably convergent method. Our framework serves as a bridge between FPFT and low-rank adaptation. We provide provable guarantees of convergence to the same solution as FPFT, along with the rate of convergence. Additionally, we present a convergence analysis for smooth, non-convex loss functions, covering gradient descent, stochastic gradient descent, and specifically for Federated Learning settings. Our theoretical findings are supported by experimental results.
Submission Number: 30
Loading