DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
Abstract: Recent advancements in Large Language Models (LLMs) have achieved robust performance across diverse tasks, but fine-tuning these models for specific domains remains resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) address this challenge by fine-tuning a small subset of parameters. However, existing methods for fusing multiple LoRAs lack dynamic fusion based on contextual inputs and often increase inference time due to token-level operations. We propose DLP-LoRA, a Dynamic Lightweight Plugin that employs a mini-MLP module with only 5M parameters to dynamically fuse multiple LoRAs at the sentence level rather than the token level using top-$p$ sampling strategies for possible LoRAs. This approach reduces inference time to less than 2x that of a single LoRA inference by leveraging parallel computation. Evaluations across 26 tasks, including multiple-choice questions and question answering, demonstrate that DLP-LoRA achieves an average accuracy of 91.9\% on multiple-choice datasets and significant improvements in BLEU, ROUGE-1 and ROUGE-L scores (54.1\%, 43.5\% and 40.8\%) on QA datasets, outperforming many LoRA baselines under different LLMs backbones. DLP-LoRA effectively balances performance and efficiency, making it a practical solution for dynamic multi-task adaptation in LLMs.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Multi-LoRA fusion, Parameter Efficient Tuning, LoRA, Cross-Task Generalization
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 903
Loading