ShareLoRA: Less Tuning, More Performance for LoRA Fine-tuning of LLMs

ShareLoRA: Less Tuning, More Performance for LoRA Fine-tuning of LLMs

ACL ARR 2024 June Submission5577 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Due to the prohibitively expensive full fine-tuning costs of large language models (LLMs), various popular parameter-efficient fine-tuning (PEFT) methods have been developed. These methods majorly rely on fine-tuning few add-on modules, popularly referred to as {\it adapters}, that corresponds to only \textit{small fraction of LLM parameters}. In specific, low rank adaptation (LoRA), has demonstrated impressive parameter efficiency while yielding performance close to the full fine-tuning. However, classical LoRA may still fine-tune more parameters as compared to the inherent rank of the pre-trained weights \cite{aghajanyan2020intrinsic}, leaving room for further parameter reduction. To mitigate this, only recently, few researches had proposed various freezing strategy of LoRA projection matrices, however, at the cost of additional FLOPs. To improve fine-tuning efficiency, in this work, we present ShareLoRA, that leverages a novel approach to use the redundancy in pre-trained model weights and share LoRA modules to significantly reduce the trainable parameter counts. In specific, \autolora{} automatically finds the redundancy of the pre-trained weights and determines which LoRA adapters can share parameters. For this, \autolora{} uses the similarity between representations to assess the information redundancy and a greedy algorithm to maximize the sharing of LoRA modules. We performed extensive evaluations with LLaMA family LLMs across various tasks. In specific, at reduced PEFT parameter count of up to \textbf{23}$\%$, ShareLoRA performs similar or better that of the existing PEFT alternatives.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: parameter-efficient-training,fine-tuning,NLP in resource-constrained settings

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 5577

Loading