Keywords: Low Rank Adaptation, Parameter efficient finetuning
Abstract: As the scale of large language models (LLMs) continues to increase, the parameter-efficient finetuning (PEFT) of LLMs has drawn much attention. One of the most popular PEFT methods is low-rank adaptation (LoRA), which has attracted numerous subsequent works aimed at improving it. While rank is one of the essential hyperparameters for LoRA, many previous works propose dynamically allocating rank to reduce the computational demand, e.g., pruning insignificant channels to reduce the rank. This paper proposes a principled method to proactively guide the LoRA module to utilize the allocated rank fully. We first provide a new perspective to understand the difference between LoRA and full fine-tuning. We demonstrate that the two weight matrices in the LoRA module serve as proxies for the input and output gradients. Since the input of each layer is generally more stable than the gradient, the channel difference mainly reflects on the weight matrix at the left (weight matrix $B$). We further propose a principled plug-in method, grounded in theoretical analysis and empirical findings. Our proposed method reweights the two weight matrices in LoRA using a simple yet effective algorithm to further stabilize and encourage the training of insignificant channels. Experiments are conducted on widely used models (Llama, Mistral, etc.) and benchmarks (GSM8k, GLUE, SQuAD, etc.), where our proposed method significantly boosts the performance of LoRA and its variants as a plug-in.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 11879
Loading