Seeded LoRA: Collaborative Fine-Tuning Through Seed Initialization of Adapters

Published: 21 Jun 2024, Last Modified: 26 Jul 2024ES-FoMo-II 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Parameter-Efficient Fine-Tuning, Collaborative Fine-Tuning, Mixture-of-Expert models, Uniform Routing, Unsupervised Domain Discovery, zero-shot tasks, Grouped Convolutions, multi-head processing, optimization space, model fine-tuning
TL;DR: Seeded LoRA, a model merging method that does not require post-merge finetuning achieves SoTA results compared to other MoE PEFT approaches
Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods enable cost-effective adaptation of pretrained language models to specific tasks and domains. Collaborative Fine-Tuning (CoFT) seeks to merge these specialized models into a single model, often a routed Mixture-of-Expert (MoE) model, to achieve better generalization across domains and tasks. However, current CoFT models require a post-merge fine-tuning stage, making these approaches inaccessible to users lacking fine-tuning expertise. We introduce Seeded LoRA, a novel CoFT approach that does not require post-merge fine-tuning, enabling plug-and-play PEFT adapter merging. Seeded LoRA outperforms LoRA and MoE LoRA (MoLoRA) approaches by an average of 7 percentage points across 16 zero-shot tasks. Seeded LoRA works by initializing a model with a generic seed expert low-rank adapter, ensuring subsequent fine-tuning runs are in the same optimization space, exhibiting linear mode connectivity. This process allows integrating independently fine-tuned models into a single model using a static, untrained soft uniform probability router. We show that this formulation is equivalent to grouped convolution or multi-head processing, explaining its effectiveness. Additionally, we highlight that Seeded LoRA alleviates most routing failures in post-merge fine-tuning, making it a suitable base method for future routed CoFT approaches.
Submission Number: 38
Loading