The Quest for Winning Tickets in Low-Rank Adapters

Hamed Damirchi; Cristian Rodriguez-Opazo; Ehsan Abbasnejad; Zhen Zhang; Javen Qinfeng Shi

The Quest for Winning Tickets in Low-Rank Adapters

Hamed Damirchi, Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Zhen Zhang, Javen Qinfeng Shi

22 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Parameter-efficient fine-tuning, Low-Rank Adaptation, Lottery Ticket Hypothesis, Pruning

TL;DR: We sparsify LoRAs using random masking techniques, reducing trainable parameters by up to 87% without performance loss, enabling more efficient fine-tuning of large models.

Abstract: Low-Rank Adaptation (LoRA), a prominent parameter-efficient fine-tuning (PEFT) method, offers an effective strategy for adapting large pre-trained models to specific tasks with minimal computational overhead. LoRA achieves this by introducing low-rank parameter matrices to the frozen pre-trained models. However, despite their efficiency, LoRA and its variants modify all elements of a parameter block, which is unnecessary as LoRA primarily aims to adjust a small set of subspaces that capture task-specific knowledge. Drawing inspiration from the Lottery Ticket Hypothesis (LTH), which posits that dense neural networks contain sparse subnetworks capable of performing similarly to fully-parameterized models, we investigate whether similar sparse subnetworks exist for low-rank adapters. We demonstrate that such subnetworks, often referred to as "winning tickets" in the context of LTH, indeed exist for low-rank adapters. We introduce a method to identify this sparse subset of weights for each layer by relating the top subspaces of the pretrained parameter block to the elements of the corresponding weight matrix. This subset is then fine-tuned using LoRA. We show that this sparse subset is not necessarily unique; as long as sparsity is kept within a certain bound defined by the task, random subnetworks with similar sparsity can act as winning tickets. Building on this discovery, we propose a novel approach called Partial-LoRA, which adds sparse low-rank parameters to pre-trained models. Through extensive experiments on 8 vision and 4 language tasks, we demonstrate that Partial-LoRA can reduce trainable parameters by up to 87% while maintaining or even improving model performance in some cases. Our work thus reduces memory needs and theoretically grounds sparse LoRAs.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2469

Loading