OD-LoRA: Overcoming the Dilemma between Weight Representation and Gradient Approximation in Low-Rank Adaptation
Keywords: low-rank adaptation, low-rank gradient approximation
TL;DR: We theoretically show that LoRA faces a dilemma between weight update representation and gradient approximation and propose a new method to overcome this.
Abstract: Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by representing weight updates with trainable low-rank (LR) matrices.
Recent studies have shown a different perspective that learning with LoRA is equivalent to using a low-rank approximation of the full fine-tuning gradient, obtained by mapping it onto low-rank subspaces through the LR matrices.
In this paper, we theoretically show that LoRA faces a dilemma between these two perspectives: weight update representation and gradient approximation.
We first demonstrate that the quality of gradient approximation is improved if the LR matrices have uniform singular values, since non-uniform singular values anisotropically distort the projection of the full gradient onto the subspaces.
However, this condition entails a strict constraint on the weight updates, significantly compromising their representational capacity.
To Overcome this Dilemma, we introduce a new method, named OD-LoRA, which decouples the approximated gradient from the singular values of the LR matrices.
Specifically, OD-LoRA ensures that the full gradient is mapped through the orthonormal bases of the low-rank subspaces defined by LR matrices, achieving perfect projection onto the subspaces, while still allowing the singular values to represent the weight updates.
Consequently, OD-LoRA achieves both the optimal condition for accurate gradient approximation and unconstrained representation of weight updates simultaneously.
The experimental results on natural language and vision benchmarks demonstrate that OD-LoRA improves loss convergence and gradient approximation quality, significantly enhancing the adaptation performance of LoRA.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 3878
Loading