Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient
  estimators are used to estimate gradients of expectations throughout
  machine learning and reinforcement learning; however, they are usually
  explained as simple mathematical tricks, with no insight into their
  nature. We use a first principles approach to explain that LR and RP
  are alternative methods of keeping track of the movement of
  probability mass, and the two are connected via the divergence
  theorem. Moreover, we show that the space of all possible estimators
  combining LR and RP can be completely parameterized by a flow field
  $u(x)$ and an importance sampling distribution
  $q(x)$. We prove that there cannot exist a single-sample
  estimator of this type outside our characterized space, thus,
  clarifying where we should be searching for better Monte Carlo
  gradient estimators.
0 Replies
Loading