Keywords: Locality Sensitive Hashing, LSH, Estimation, Sampling, Softmax, Attention
TL;DR: Locality-Sensitive Hashing is an efficient, informative sampler, capable of accurately estimating the softmax normalization constant in sub-linear time.
Abstract: The softmax function has multiple applications in large-scale machine learning. However, calculating the partition function is a major bottleneck for large state spaces. In this paper, we propose a new sampling scheme using locality-sensitive hashing (LSH) and an unbiased estimator that approximates the partition function accurately in sub-linear time. The samples are correlated and unnormalized, but the derived estimator is unbiased. We demonstrate the significant advantages of our proposal by comparing the speed and accuracy of LSH-Based Samplers (LSS) against other state-of-the-art estimation techniques.