Pitfalls of Gaussians as a noise distribution in NCE

Holden Lee; Chirag Pabbaraju; Anish Prasad Sevekari; Andrej Risteski

Pitfalls of Gaussians as a noise distribution in NCE

Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski

Published: 01 Feb 2023, Last Modified: 02 Mar 2023ICLR 2023 posterReaders: Everyone

Keywords: NCE, Noise Contrastive Estimation, Generative Models, statistical efficiency, theory

TL;DR: We show that using Gaussians as the noise distribution in Noise Contrastive Estimation can lead to exponentially bad statistical and algorithmic complexity.

Abstract: Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an (easy-to-sample) noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss - even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice - suggesting that more complex noise distributions are essential to the success of NCE.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

16 Replies

Loading