REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

George Tucker; Andriy Mnih; Chris J. Maddison; Jascha Sohl-Dickstein

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

George Tucker, Andriy Mnih, Chris J. Maddison, Jascha Sohl-Dickstein

08 Jul 2025 (modified: 17 Mar 2017)ICLR 2017Readers: Everyone

Abstract: Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, unbiased gradient estimates. We present encouraging preliminary results on a toy problem and on learning sigmoid belief networks.

TL;DR: Combining REINFORCE with the Concrete relaxation to get low variance, unbiased gradient estimates.

Keywords: Unsupervised Learning, Reinforcement Learning, Optimization

Conflicts: google.com, stats.ox.ac.uk

3 Replies

Loading