Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Alexander Immer; Tycho F. A. van der Ouderaa; Mark van der Wilk; Gunnar Ratsch; Bernhard Schölkopf

Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Alexander Immer, Tycho F. A. van der Ouderaa, Mark van der Wilk, Gunnar Ratsch, Bernhard Schölkopf

Published: 20 Jun 2023, Last Modified: 18 Jul 2023AABI 2023 - Fast TrackEveryoneRevisionsBibTeX

Keywords: Marginal Likelihood, Bayesian Model Selection, Laplace approximation, Neural Tangent Kernel

TL;DR: We derive lower bounds of the linearized Laplace approximation to the marginal likelihood that enable SGD-based hyperparameter optimization

Abstract: Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade-off between estimation accuracy and computational complexity. We derive these bounds using the functional form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show the tightness of the proposed bounds and their performance on gradient-based hyperparameter optimization.

Publication Venue: ICML 2023

Submission Number: 18

Loading