% !TEX root = ../main.tex
CVGP enables a complementary Bayesian coreset learning-based perspective on sparse $\gp$ inference.
Methodologically, CVGP maximizes the loss in Equation~\eqref{eq:loss_coreset_posterior_gp_analytical} for $\gp$ posterior inference;
\ie it maximizes the variational lower-bound $\Loss_{CVGP}$
with respect to CVGP parameters $\{\XbC, \ybC, \betabC\}$,
encouraging approximations that minimize the gap to the true $\gp$ posterior.
We note that,
$\Loss_{CVGP} \rightarrow \Loss = \log \p{\yb} \;$
implies $\Delta_{CVGP}=\kl{\cq{\fb,\fbC}{\XbC, \ybC, \betabC}}{\cp{\fb,\fbC}{\yb}} \rightarrow 0$.
Hence,
CVGP learns coresets that minimize the distance between
its variational distribution
and the true $\gp$ posterior.

To do so, it finds ---indirectly, yet efficiently--- a sparse representation of the data (\ie the coreset triplet)
that captures as much information as the $\gp$ posterior of interest,
measured by the KL divergence between the true and CVGP's posterior.
% Initialization
Initial estimates of the coreset triplet $\{\XbC, \ybC, \betabC\}$
can be selected randomly or using k-means
% \footnote{
% Any technique that may help with initialization, from either the coreset literature or the stochastic optimization literature, is readily applicable to CVGP.
% }
(we evaluate CVGP's robustness to coreset initialization
in Appendix \ref{asssec:app_exp_robustness} and \ref{asec:qual_study})
and recommend the latter.

% Learning and downweighting
Importantly, CVGP's learning procedure enables an
\textbf{automatic relevance determination of pseudo-points} $\{\XbC, \ybC\}$
via adaptation of their $\betabC$ values:
\ie CVGP has the inherent flexibility to
up- or down-weight (``\textit{ignore}'') the pseudo-points that are
deemed (or not) important to describe the observed data
---see experiments in Section~\ref{ssec:exp_coresets}.

Namely, inspection of $\cq{\fb}{\XbC, \ybC, \betabC}$
elucidates which learned coreset tuples $\{\XbC, \ybC\}$ weighted by $\betabC$,
help describe the $\gp$ posterior best
---as illustrated in Figure~\ref{fig:exp_coresets_predictive}.
% Interpretability
We note that CVGP's coreset-based variational posteriors,
when derived from the function-space and weight-space views of $\gp$s
---see Appendix Section~\ref{asec:cvtgp_derivation} for both derivations---
provide complementary posterior insights.