- TL;DR: Combining REINFORCE with the Concrete relaxation to get low variance, unbiased gradient estimates.
- Abstract: Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, unbiased gradient estimates. We present encouraging preliminary results on a toy problem and on learning sigmoid belief networks.
- Keywords: Unsupervised Learning, Reinforcement Learning, Optimization
- Conflicts: google.com, stats.ox.ac.uk