Singular Value Adaptation for Parameter-Efficient Fine Tuning

Susmit Agrawal; Krishn Vishwas Kher; Swarnim Maheshwari; Rishabh Lalla; Vineeth N. Balasubramanian

Singular Value Adaptation for Parameter-Efficient Fine Tuning

Susmit Agrawal, Krishn Vishwas Kher, Swarnim Maheshwari, Rishabh Lalla, Vineeth N. Balasubramanian

26 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transfer learning, Adaptation, Parameter-Efficient Fine-tuning

TL;DR: We propose a novel PEFT method, SiVA, grounded in theoretical insights, that demonstrates marked reduction in the number of parameters compared to existing methods while achieving SoTA on a myriad of CV/NLP tasks.

Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become a crucial approach in handling the growing complexity of large models and vast datasets across multiple fields such as Computer Vision or Natural Language Processing. Among the most promising of these methods are Low-Rank Adaptation (LoRA) and its derivatives, which fine-tune a pre-trained weight matrix $\mathbf{W}$ by introducing a low-rank update matrix $\mathbf{\Delta W}$. While these approaches have demonstrated strong empirical performance, they remain largely heuristic, with little theoretical grounding to explain their behavior or guide the design of $\mathbf{\Delta W}$ for different objectives. This lack of theoretical insight limits our understanding of when these methods are most effective and how they can be systematically improved. In this paper, we propose a theoretical framework for analyzing and designing LoRA-based methods, with a focus on the formulation of $\mathbf{\Delta W}$. By establishing a deeper understanding of the interplay between $\mathbf{W}$ and $\mathbf{\Delta W}$, we aim to enable more efficient and targeted fine-tuning strategies, opening the door to novel variants that strike an optimal balance between performance and efficiency. Our proposed method - \textbf{Si}ngular \textbf{V}alue \textbf{A}daptation - uses insights from our theoretical framework to incorporate inductive biases on the formulation of $\mathbf{\Delta W}$, leading to a PEFT method that is up to 50$\times$ more parameter efficient that LoRA, while achieving comparable or better performance across various vision and language tasks.

Supplementary Material: pdf

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6594

Loading