2022 (modified: 03 Feb 2023)ICML 2022Readers: Everyone
Abstract:Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without s...