SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam; Atula Tejaswi Neerkaje; Aditya Vavre; Aneesh Shetty; Gautham Krishna Gudur; Joydeep Ghosh; Eunsol Choi; Alex Dimakis; Aleksandar Bojchevski; sujay sanghavi

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, sujay sanghavi

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC0 1.0

Keywords: Parameter Efficient Fine Tuning, Large Language Models, Deep Learning

Abstract: Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $$\mathbf{W}$$ and inject learnable matrices $$\mathbf{\Delta W}$$. These $$\mathbf{\Delta W}$$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically exhibit a performance gap compared to full fine-tuning. While recent PEFT methods have narrowed this gap, they do so at the expense of additional learnable parameters. We propose SVFT, a *simple* approach that structures $$\mathbf{\Delta W}$$ based on the specific weight matrix $$\mathbf{W}$$. SVFT updates $$\mathbf{W}$$ as a sparse combination $$M$$ of outer products of its singular vectors, training only the coefficients of these combinations. Crucially, we make additional off-diagonal elements in $M$ learnable, enabling a smooth trade-off between trainable parameters and expressivity—an aspect that distinctly sets our approach apart from previous works leveraging singular values. Extensive experiments on language and vision benchmarks show that SVFT recovers up to **96%** of full fine-tuning performance while training only **0.006 to 0.25%** of parameters, outperforming existing methods that achieve only up to **{85\%}** performance with **0.03 to 0.8%** of the trainable parameter budget.

Primary Area: Deep learning architectures

Submission Number: 5572

Loading