Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

ACL ARR 2024 June Submission4841 Authors

16 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we present \textbf{Delta-LoRA}, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $A$ and $B$, but also propagate the learning to the pre-trained weights $W$ via updates utilizing the delta of the product of two low-rank matrices ($A^{(t+1)}B^{(t+1)} - A^{(t)}B^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $W$ does not need to compute the gradients of $W$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 4841
Loading