SiRA: Sparse Mixture of Low Rank Adaptation

ACL ARR 2024 June Submission5155 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Parameter Efficient Tuning (PET) techniques such as Low-rank Adaptation~(LoRA) are effective methods to adapt Large Language Models to downstream tasks. We propose Sparse mixture of low Rank Adaption~(SiRA), which uses Sparse Mixture of Experts (SMoE) by enforcing conditional computation with top k LoRA weights. SiRA is optimized through a combination of training techniques, including an auxiliary loss encouraging load balancing, a capacity limit which restricts the maximum number of tokens each expert can process, and novel expert dropout on top of the gating network. Through extensive experiments, we show that SiRA performs better than LoRA and other mixture of expert approaches across different single-task and multiple-task settings. Results show SiRA has more orthogonal low rank spaces and consumes less computing resources compared to other MoE variants.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Mixture of Experts, LoRA, Parameter Efficient Tuning
Languages Studied: English, Swahili, Bengali
Submission Number: 5155
Loading