SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC0 1.0
Keywords: Parameter Efficient Fine Tuning, Large Language Models, Deep Learning
Abstract: Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $\(\mathbf{W}\)$ and inject learnable matrices $\(\mathbf{\Delta W}\)$. These $\(\mathbf{\Delta W}\)$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically exhibit a performance gap compared to full fine-tuning. While recent PEFT methods have narrowed this gap, they do so at the expense of additional learnable parameters. We propose SVFT, a *simple* approach that structures $\(\mathbf{\Delta W}\)$ based on the specific weight matrix $\(\mathbf{W}\)$. SVFT updates $\(\mathbf{W}\)$ as a sparse combination $\(M\)$ of outer products of its singular vectors, training only the coefficients of these combinations. Crucially, we make additional off-diagonal elements in $M$ learnable, enabling a smooth trade-off between trainable parameters and expressivity—an aspect that distinctly sets our approach apart from previous works leveraging singular values. Extensive experiments on language and vision benchmarks show that SVFT recovers up to **96%** of full fine-tuning performance while training only **0.006 to 0.25%** of parameters, outperforming existing methods that achieve only up to **{85\%}** performance with **0.03 to 0.8%** of the trainable parameter budget.
Primary Area: Deep learning architectures
Submission Number: 5572
Loading