FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA

Seanie Lee; Sangwoo Park; Dong Bok Lee; Dominik Wagner; Haebin Seong; Tobias Bocklet; Juho Lee; Sung Ju Hwang

FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA

Seanie Lee, Sangwoo Park, Dong Bok Lee, Dominik Wagner, Haebin Seong, Tobias Bocklet, Juho Lee, Sung Ju Hwang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, LoRA, Differential Privacy

TL;DR: FedSVD mitigates noise amplification in differentially private federated fine-tuning with LoRA by periodically refactorizing LoRA matrices via SVD, improving stability and performance under DP-SGD.

Abstract: Low-Rank Adaptation (LoRA), which introduces a product of two trainable low-rank matrices into frozen pre-trained weights, is widely used for efficient fine-tuning of language models in federated learning (FL). However, when combined with differentially private stochastic gradient descent (DP-SGD), LoRA faces substantial noise amplification: DP-SGD perturbs per-sample gradients, and the matrix multiplication of the LoRA update ($BA$) intensifies this effect. Freezing one matrix (*e.g.*, $A$) reduces the noise but restricts model expressiveness, often resulting in suboptimal adaptation. To address this, we propose $\texttt{FedSVD}$, a simple yet effective method that introduces a global reparameterization based on singular value decomposition (SVD). In our approach, each client optimizes only the $B$ matrix and transmits it to the server. The server aggregates the $B$ matrices, computes the product $BA$ using the previous $A$, and refactorizes the result via SVD. This yields a new adaptive $A$ composed of the orthonormal right singular vectors of $BA$, and an updated $B$ containing the remaining SVD components. This reparameterization avoids quadratic noise amplification, while allowing $A$ to better capture the principal directions of the aggregate updates. Moreover, the orthonormal structure of $A$ bounds the gradient norms of $B$ and preserves more signal under DP-SGD, as confirmed by our theoretical analysis. As a result, $\texttt{FedSVD}$ consistently improves stability and performance across a variety of privacy settings and benchmarks, outperforming relevant baselines under both private and non-private regimes.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 16334

Loading