Analyzing and Improving the Optimization Landscape of Noise-Contrastive EstimationDownload PDF

29 Sept 2021, 00:32 (edited 16 Mar 2022)ICLR 2022 SpotlightReaders: Everyone
  • Keywords: noise contrastive estimation, contrastive learning, unsupervised learning, theory
  • Abstract: Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE’s performance. However, such observation has never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE’s poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called \emph{eNCE} which uses an exponential loss and for which \emph{normalized gradient descent} addresses the landscape issues \emph{provably} when the target and noise distributions are in a given exponential family.
  • One-sentence Summary: This work theoretically explains the difficulty of optimizing the NCE loss when the noise distribution is poor, and provides a provably efficient solution consisting of normalized gradient descent (NGD) combined with the proposed \emph{eNCE} loss.
  • Supplementary Material: zip
16 Replies