Retraction-free optimization over the Stiefel manifold with application to the LoRA fine-tuning

14 May 2024 (modified: 10 May 2025)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Manfold, Landing, LoRA, fine-tuning
TL;DR: We develop landing theory for optimization over the Stiefel manifold and subsequently propose a Manifold-LoRA algorithm to accelerate the LoRA fine-tuning of large language models.
Abstract: Optimization over the Stiefel manifold has played a significant role in various machine learning tasks. Many existing algorithms either use the retraction operator to keep each iterate staying on the manifold, or solve an unconstrained quadratic penalized problem. The retraction operator in the former corresponds to orthonormalization of matrices and can be computationally costly for large-scale matrices. The latter approach usually equips with an unknown large penalty parameter. To address the above issues, we propose a retraction-free and penalty parameter-free algorithm, which lands on the manifold. A key component of the analysis is the convex-like property of the quadratic penalty of the Stiefel manifold, which enables us to explicitly characterize the penalty parameter. As an application, we introduce a new algorithm, Manifold-LoRA, which employs the landing technique and a carefully designed step size strategy to accelerate low-rank adaptation (LoRA) in fine-tuning large language models. Numerical experiments on the benchmark datasets demonstrate the efficiency of our proposed method.
Supplementary Material: zip
Primary Area: Optimization for deep networks
Submission Number: 8225
Loading