Abstract: Current methods for stochastic hyperparameter learning in Gaussian Processes (GPs) rely
onapproximations, suchascomputingbiasedstochasticgradientsorusinginducingpointsin
stochastic variational inference. However, when using such methods, we are not guaranteed
to converge to a stationary point of the true marginal likelihood. In this work, we propose
algorithms for exact stochastic inference of GPs with kernels that induce a Reproducing
Kernel Hilbert Space (RKHS) of moderate finite dimension. Our approach can also be
extendedtoinfinitedimensionalRKHSsatthecostofforgoingexactness. Bothforfiniteand
infinite dimensional RKHSs, our method achieves better experimental results than existing
methods when memory resources limit the feasible batch size and the possible number of
inducing points.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=nVRpd28Fms
Changes Since Last Submission: ## Changes Since Last Submission
This revision addresses the remaining reviewer concerns about empirical clarity, SVGP fairness, and the scope of the exactness guarantees.
1. **Clarified why naïve mini-batching is biased.** Added an early explanation that the GP log-determinant term is global and does not decompose over samples, so the natural mini-batch objective generally gives biased gradients of the full marginal likelihood.
2. **Replaced ambiguous aggregate pie-chart summaries with a quantitative table.** Added a win/loss table broken down by kernel, metric, method, and batch size, making the marginal-likelihood and RMSE comparisons easier to verify.
3. **Strengthened the SVGP comparison.** Clarified why setting the number of inducing points equal to the mini-batch size is a balanced compute/memory allocation, noted that this is not claimed to be optimal, and discussed the inducing-point/batch-size trade-off more explicitly.
4. **Expanded the explanation of NLL versus RMSE behavior.** Added discussion that marginal-likelihood optimization and RMSE are aligned under correct model specification, but in realistic misspecified settings lower negative log marginal likelihood need not monotonically imply lower test RMSE.
5. **Added discussion of finite-feature approximations beyond RFF.** Added text explaining how fixed Nyström features fit the proposed framework, while clarifying that adaptive or mini-batch-dependent landmark selection changes the objective and is outside the current analysis.
6. **Sharpened the scope of the exactness guarantee.** Expanded the limitations to state that exactness holds for the finite-dimensional kernel actually optimized; for Gaussian, Matérn, and other infinite-dimensional kernels, finite-feature approximations such as RFF or Nyström forfeit exactness with respect to the original kernel.
7. **Clarified complexity and memory accounting.** Revised the complexity discussion to define the per-iteration processed-point count consistently and to make the memory cost of inducing points in SVGP explicit.
8. **Updated cross-references and result discussion.** Updated the GP regression discussion to refer to the new win/loss table and batch-size plots, and toned the interpretation to distinguish optimization gains in marginal likelihood from mixed predictive-RMSE behavior.
Assigned Action Editor: ~Pan_Xu1
Submission Number: 7825
Loading