OD-LoRA: Overcoming the Dilemma between Weight Representation and Gradient Approximation in Low-Rank Adaptation

Junghun Oh; Sungyong Baik; Kyoung Mu Lee

OD-LoRA: Overcoming the Dilemma between Weight Representation and Gradient Approximation in Low-Rank Adaptation

Junghun Oh, Sungyong Baik, Kyoung Mu Lee

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: low-rank adaptation, low-rank gradient approximation

TL;DR: We theoretically show that LoRA faces a dilemma between weight update representation and gradient approximation and propose a new method to overcome this.

Abstract: Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by representing weight updates with trainable low-rank (LR) matrices. Recent studies have shown a different perspective that learning with LoRA is equivalent to using a low-rank approximation of the full fine-tuning gradient, obtained by mapping it onto low-rank subspaces through the LR matrices. In this paper, we theoretically show that LoRA faces a dilemma between these two perspectives: weight update representation and gradient approximation. We first demonstrate that the quality of gradient approximation is improved if the LR matrices have uniform singular values, since non-uniform singular values anisotropically distort the projection of the full gradient onto the subspaces. However, this condition entails a strict constraint on the weight updates, significantly compromising their representational capacity. To Overcome this Dilemma, we introduce a new method, named OD-LoRA, which decouples the approximated gradient from the singular values of the LR matrices. Specifically, OD-LoRA ensures that the full gradient is mapped through the orthonormal bases of the low-rank subspaces defined by LR matrices, achieving perfect projection onto the subspaces, while still allowing the singular values to represent the weight updates. Consequently, OD-LoRA achieves both the optimal condition for accurate gradient approximation and unconstrained representation of weight updates simultaneously. The experimental results on natural language and vision benchmarks demonstrate that OD-LoRA improves loss convergence and gradient approximation quality, significantly enhancing the adaptation performance of LoRA.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 3878

Loading