Abstract: The paper provides a thorough investigation
of Direct Loss Minimization (DLM), which
optimizes the posterior to minimize predictive
loss, in sparse Gaussian processes. For the
conjugate case, we consider DLM for log-loss
and DLM for square loss showing a significant
performance improvement in both cases. The
application of DLM in non-conjugate cases is
more complex because the logarithm of expectation in the log-loss DLM objective is often
intractable and simple sampling leads to biased estimates of gradients. The paper makes
two technical contributions to address this.
First, a new method using product sampling
is proposed, which gives unbiased estimates
of gradients (uPS) for the objective function.
Second, a theoretical analysis of biased Monte
Carlo estimates (bMC) shows that stochastic
gradient descent converges despite the biased
gradients. Experiments demonstrate empirical success of DLM. A comparison of the
sampling methods shows that, while uPS is
potentially more sample-efficient, bMC provides a better tradeoff in terms of convergence
time and computational efficiency.
0 Replies
Loading