One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundatiom models, LoRA, PEFT, fine-tuning, language models, vision transformer, decision transformer
TL;DR: We propose a novel PEFT method that initializes LoRA weights in a data-driven manner and re-distributes ranks to maximize explained variance across the model.
Abstract: Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to modulate the pre-trained weights via a low-rank adaptation (LoRA) of newly introduced weights. These weight matrices are usually initialized at random with the same rank for each layer across the FM, which results in suboptimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner, by computing singular value decomposition on activation vectors. Then, we initialize the new LoRA matrices with the obtained right-singular vectors. Finally, we re-distribute the ranks among layers to explain the maximal amount of variance across all layers. This assignment results in an adaptive allocation of ranks per weight matrix, and inherits all benefits of LoRA. We apply our new method, **E**xplained **V**ariance **A**daptation (EVA), to a variety of fine-tuning tasks comprising language understanding and generation, image classification, and reinforcement learning. EVA consistently attains the highest average score across a multitude of tasks per domain.
Submission Number: 1
Loading