Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS

Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS

TMLR Paper7825 Authors

07 Mar 2026 (modified: 06 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Current methods for stochastic hyperparameter learning in Gaussian Processes (GPs) rely onapproximations, suchascomputingbiasedstochasticgradientsorusinginducingpointsin stochastic variational inference. However, when using such methods, we are not guaranteed to converge to a stationary point of the true marginal likelihood. In this work, we propose algorithms for exact stochastic inference of GPs with kernels that induce a Reproducing Kernel Hilbert Space (RKHS) of moderate finite dimension. Our approach can also be extendedtoinfinitedimensionalRKHSsatthecostofforgoingexactness. Bothforfiniteand infinite dimensional RKHSs, our method achieves better experimental results than existing methods when memory resources limit the feasible batch size and the possible number of inducing points.

Submission Type: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=nVRpd28Fms

Changes Since Last Submission: ## Changes Since Last Submission This revision addresses the remaining reviewer concerns about empirical clarity, SVGP fairness, and the scope of the exactness guarantees. 1. **Clarified why naïve mini-batching is biased.** Added an early explanation that the GP log-determinant term is global and does not decompose over samples, so the natural mini-batch objective generally gives biased gradients of the full marginal likelihood. 2. **Replaced ambiguous aggregate pie-chart summaries with a quantitative table.** Added a win/loss table broken down by kernel, metric, method, and batch size, making the marginal-likelihood and RMSE comparisons easier to verify. 3. **Strengthened the SVGP comparison.** Clarified why setting the number of inducing points equal to the mini-batch size is a balanced compute/memory allocation, noted that this is not claimed to be optimal, and discussed the inducing-point/batch-size trade-off more explicitly. 4. **Expanded the explanation of NLL versus RMSE behavior.** Added discussion that marginal-likelihood optimization and RMSE are aligned under correct model specification, but in realistic misspecified settings lower negative log marginal likelihood need not monotonically imply lower test RMSE. 5. **Added discussion of finite-feature approximations beyond RFF.** Added text explaining how fixed Nyström features fit the proposed framework, while clarifying that adaptive or mini-batch-dependent landmark selection changes the objective and is outside the current analysis. 6. **Sharpened the scope of the exactness guarantee.** Expanded the limitations to state that exactness holds for the finite-dimensional kernel actually optimized; for Gaussian, Matérn, and other infinite-dimensional kernels, finite-feature approximations such as RFF or Nyström forfeit exactness with respect to the original kernel. 7. **Clarified complexity and memory accounting.** Revised the complexity discussion to define the per-iteration processed-point count consistently and to make the memory cost of inducing points in SVGP explicit. 8. **Updated cross-references and result discussion.** Updated the GP regression discussion to refer to the new win/loss table and batch-size plots, and toned the interpretation to distinguish optimization gains in marginal likelihood from mixed predictive-RMSE behavior.

Assigned Action Editor: ~Pan_Xu1

Submission Number: 7825

Loading