Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Will Grathwohl; Dami Choi; Yuhuai Wu; Geoff Roeder; David Duvenaud

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud

15 Feb 2018 (modified: 30 Mar 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, based on gradients of a learned function. These estimators can be jointly trained with model parameters or policies, and are applicable in both discrete and continuous settings. We give unbiased, adaptive analogs of state-of-the-art reinforcement learning methods such as advantage actor-critic. We also demonstrate this framework for training discrete latent-variable models.

TL;DR: We present a general method for unbiased estimation of gradients of black-box functions of random variables. We apply this method to discrete variational inference and reinforcement learning.

Keywords: optimization, machine learning, variational inference, reinforcement learning, gradient estimation, deep learning, discrete optimization

Code: [![github](/images/github_icon.svg) duvenaud/relax](https://github.com/duvenaud/relax) + [![Papers with Code](/images/pwc_icon.svg) 6 community implementations](https://paperswithcode.com/paper/?openreview=SyzKd1bCW)

Data: [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/backpropagation-through-the-void-optimizing/code)

15 Replies

Loading