Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient
estimators are used to estimate gradients of expectations throughout
machine learning and reinforcement learning; however, they are usually
explained as simple mathematical tricks, with no insight into their
nature. We use a first principles approach to explain that LR and RP
are alternative methods of keeping track of the movement of
probability mass, and the two are connected via the divergence
theorem. Moreover, we show that the space of all possible estimators
combining LR and RP can be completely parameterized by a flow field
$u(x)$ and an importance sampling distribution
$q(x)$. We prove that there cannot exist a single-sample
estimator of this type outside our characterized space, thus,
clarifying where we should be searching for better Monte Carlo
gradient estimators.
0 Replies
Loading