Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference

Shun Liao; Ting Chen; Tian Lin; Dengyong Zhou; Chong Wang

Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference

Shun Liao, Ting Chen, Tian Lin, Dengyong Zhou, Chong Wang

20 Oct 2018 (modified: 05 May 2023)NIPS 2018 Workshop CDNNRIA Blind SubmissionReaders: Everyone

Abstract: Computations for the softmax function in neural network models are expensive when the number of output classes is large. This can become a significant issue in both training and inference for such models. In this paper, we present Doubly Sparse Softmax (DS-Softmax), Sparse Mixture of Sparse of Sparse Experts, to improve the efficiency for softmax inference. During training, our method learns a two-level class hierarchy by dividing entire output class space into several partially overlapping experts. Each expert is responsible for a learned subset of the output class space and each output class only belongs to a small number of those experts. During inference, our method quickly locates the most probable expert to compute small-scale softmax. Our method is learning-based and requires no knowledge of the output class partition space a priori. We empirically evaluate our method on several real-world tasks and demonstrate that we can achieve significant computation reductions without loss of

TL;DR: We present doubly sparse softmax, the sparse mixture of sparse of sparse experts, to improve the efficiency for softmax inference through exploiting the two-level overlapping hierarchy.

Keywords: hierarchical softmax, fast inference, model compression

5 Replies

Loading