Abstract: Log-linear models models are widely used in machine learning, and in particular are ubiquitous in deep learning architectures in the form of the softmax. While exact inference and learning of these requires linear time, it can be done approximately in sub-linear time with strong concentrations guarantees. In this work, we present LSH Softmax, a method to perform sub-linear learning and inference of the softmax layer in the deep learning setting. Our method relies on the popular Locality-Sensitive Hashing to build a well-concentrated gradient estimator, using nearest neighbors and uniform samples. We also present an inference scheme in sub-linear time for LSH Softmax using the Gumbel distribution. On language modeling, we show that Recurrent Neural Networks trained with LSH Softmax perform on-par with computing the exact softmax while requiring sub-linear computations.
TL;DR: we present LSH Softmax, a softmax approximation layer for sub-linear learning and inference with strong theoretical guarantees; we showcase both its applicability and efficiency by evaluating on a real-world task: language modeling.
Keywords: LSH, softmax, deep, learning, sub, linear, efficient, GPU
Data: [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)
14 Replies
Loading