Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Unbiased scalable softmax optimization
Nov 03, 2017 (modified: Dec 14, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Recent state-of-the-art neural network and language models have begun to rely on softmax distributions with an extremely large number of categories. In this context calculating the softmax normalizing constant is prohibitively expensive, which has spurred a growing literature of efficiently computable but biased estimates of the softmax. In this paper we present the first two unbiased algorithms for optimizing the softmax whose work per iteration is independent of the number of classes and datapoints (and does not require extra work at the end of each epoch). We compare their empirical performance to the state-of-the-art on seven real world datasets, with our Implicit SGD algorithm comprehensively outperforming all competitors.
TL;DR:Propose first methods for exactly optimizing the softmax distribution using stochastic gradient with runtime independent on the number of classes or datapoints.
Keywords:softmax, optimization, implicit sgd
Enter your feedback below and we'll get back to you as soon as possible.