SiRA: Sparse Mixture of Low Rank Adaptation

Anonymous

SiRA: Sparse Mixture of Low Rank Adaptation

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We propose Sparse mixture of low Rank Adaption (SiRA) , leveraging the Sparse Mixture of Expert (SMoE) for parameter efficient tuning.

Abstract: Parameter Efficient Tuning (PET) techniques such as Low-rank Adaptation (LoRA) are effective methods to adapt Large Language Models to downstream tasks. While several prior works introduce dense computation where the trainable parameters are shared by all input tokens, very few previous works exploring the usage of sparse and dynamic computation in PET methods. To bridge this gap, we propose Sparse mixture of low Rank Adaption (SiRA), leveraging the Sparse Mixture of Expert (SMoE) that enforces conditional computation with the top k experts routing. We empirically find that each expert learns a distinct computation which facilitates better performance. SiRA is optimized through a combination of training techniques, including an auxiliary loss encouraging load balancing, a capacity limit which restrict the maximum number of tokens each expert can process, and a novel expert dropout on top of gating network. Through extensive experiments, we show that SiRA performs better than LoRA and other mixture of expert approaches across different single-task and multiple-task settings.

Paper Type: short

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches to low-resource settings

Languages Studied: English, Swahili, Bengali

0 Replies

Loading