Keywords: algorithmic differentiation, automatic differentiation, stochastic backpropagation, monte carlo derivatives, pathwise derivatives
TL;DR: We propose an optimal smoothing parametrization for gradient estimators of expectations of discontinuous functions.
Abstract: We propose an optimal smoothing parametrization for gradient estimators of expectations of discontinuous functions. The reparametrization trick with discontinuous functions gives gradient estimators for discrete random variables and makes smoothing applicable in the machine learning context (e.g. variational inference and stochastic neural networks). Our approach is based on an objective that can be solved simultaneously with a primal optimization task. Optimal smoothing is general purpose in the sense that it only requires an extension of the algorithmic differentiation tool without the need to rearrange the model.