Seeded LoRA: Collaborative Fine-Tuning Through Seed Initialization of Adapters

Alejandro R. Salamanca; Ahmet Üstün; Nicki Skafte Detlefsen; Tim Dettmers

Seeded LoRA: Collaborative Fine-Tuning Through Seed Initialization of Adapters

Alejandro R. Salamanca, Ahmet Üstün, Nicki Skafte Detlefsen, Tim Dettmers

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: PEFT, LoRA, MoE

TL;DR: Seeded LoRA is a new approach to Collaborative Fine-Tuning that merges specialized language model adapters without requiring post-merge finetuning.

Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods facilitate the cost-effective adaptation of pretrained language models to specific tasks and domains. These methods have enabled the open-source community to develop thousands of specialized models tailored to various domains and tasks. Collaborative Fine-Tuning (CoFT) is the paradigm that seeks to merge these specialized models into a single model -- often a routed Mixture-of-Expert (MoE) model -- to achieve better generalization across domains and tasks. However, current CoFT models require a post-merge fine-tuning stage to successfully combine existing models, making CoFT approaches inaccessible to users who lack fine-tuning expertise. In this work, we introduce Seeded LoRA, a novel CoFT approach that does not require post-merge fine-tuning thus enabling plug-and-play PEFT adapter merging. Seeded LoRA significantly outperforms LoRA and MoE LoRA (MoLoRA) approaches, improving by an average of 7 percentage points across a battery of 16 zero-shot tasks and we find that the main benefit from Seeded LoRA comes from mitigating task interference during finetuning. Seeded LoRA works by initializing a model before fine-tuning using a generic seed expert low-rank adapter which was finetuned on a small random subset of the finetuning data such that subsequent fine-tuning runs are initialized in the same optimization subspace. This process enables the integration of any combination of independently fine-tuned models through simple averaging of expert adapter outputs. We show that averaging, or routing with assigning equal probability weights to each expert, is equivalent to grouped convolution, explaining its effectiveness. Additionally, we study subtle routing failures in post-merge fine-tuning and highlight that Seeded LoRA can alleviate most routing failures, making it a suitable base method for future routed CoFT approaches.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9384

Loading