Abstract: Stochastic gradient descent is a workhorse in modern deep learning. The gradient of interest is almost always the gradient of an expectation, which is unavailable in closed form. The pathwise and score-function gradient estimators represent the most common approaches to estimating the gradient of an expectation. When it is applicable, the pathwise gradient estimator is often preferred over the score-function gradient estimator because it has substantially lower variance. Indeed, the latter is almost always applied with some variance reduction techniques. However, a series of works suggest, in the context of variational inference, that pathwise gradient estimators may also benefit from variance reduction. In this work, we review existing control-variates-based variance reduction methods for pathwise gradient estimators to determine their effectiveness. Work in this vein generally rely on approximations of the integrand which necessitates the functional form of the variational family be simple. In light of this limitation, we also propose applying zero-variance control variates on pathwise gradient estimators, as the control variates have the advantage that requires little assumption on the variational distribution, other than being able to sample from it.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=ydDQrzPuNd
Changes Since Last Submission: Change font to Computer Modern Bright and reduce vertical spacing to 11 points.
Assigned Action Editor: ~Francisco_J._R._Ruiz1
Submission Number: 1528
Loading