Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
TL;DR: In this work, we propose GOAT, a novel framework that enhances LoRA fine-tuning by adaptively integrating SVD-structured priors and aligning low-rank gradients with full fine-tuned MoE through theoretical scaling.
Abstract: While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT).
Current methods optimize LoRA by initializing with static singular
value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge.
{Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture.}
{However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture.}
To mitigate these issues, we propose \underline{G}reat L\underline{o}R\underline{A} Mixture-of-Exper\underline{t} (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2)
aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor.
We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE’s efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT’s state-of-the-art performance, closing the gap with Full FT. Our code is available at: https://github.com/Facico/GOAT-PEFT
Lay Summary: Large language models like ChatGPT are powerful but require expensive and time-consuming training to perform well on specific tasks. A popular method called LoRA allows these models to be fine-tuned more efficiently by updating only a small part of the model. However, LoRA often doesn’t perform as well as fully fine-tuning the entire model. Our work introduces a new method called GOAT, which improves LoRA by combining it with a system that uses multiple expert components, each specializing in different tasks. We also discover a simple mathematical adjustment that helps these expert components work better together. This makes our method both efficient and powerful—achieving performance close to full fine-tuning without needing extra computational cost. We tested our method on 25 tasks, ranging from language understanding to image recognition, and found that it consistently outperforms previous approaches.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/Facico/GOAT-PEFT
Primary Area: Deep Learning->Foundation Models
Keywords: Parameter-Efficient Fine-Tuning, Large Language Model, Mixture of Experts, Low-Rank Adaptation
Submission Number: 1201
Loading