SiRA: Sparse Mixture of Low Rank AdaptationDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We propose Sparse mixture of low Rank Adaption (SiRA) , leveraging the Sparse Mixture of Expert (SMoE) for parameter efficient tuning.
Abstract: Parameter Efficient Tuning (PET) techniques such as Low-rank Adaptation (LoRA) are effective methods to adapt Large Language Models to downstream tasks. While several prior works introduce dense computation where the trainable parameters are shared by all input tokens, very few previous works exploring the usage of sparse and dynamic computation in PET methods. To bridge this gap, we propose Sparse mixture of low Rank Adaption (SiRA), leveraging the Sparse Mixture of Expert (SMoE) that enforces conditional computation with the top k experts routing. We empirically find that each expert learns a distinct computation which facilitates better performance. SiRA is optimized through a combination of training techniques, including an auxiliary loss encouraging load balancing, a capacity limit which restrict the maximum number of tokens each expert can process, and a novel expert dropout on top of gating network. Through extensive experiments, we show that SiRA performs better than LoRA and other mixture of expert approaches across different single-task and multiple-task settings.
Paper Type: short
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings
Languages Studied: English, Swahili, Bengali
0 Replies

Loading