Learning Gaussian Processes with Bayesian Posterior Optimization

Luiz F. O. Chamon, Santiago Paternain, Alejandro Ribeiro

Published: 2019, Last Modified: 27 Sept 2024ACSSC 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Gaussian processes (GPs) are often used as prior distributions in non-parametric Bayesian methods due to their numerical and analytical tractability. GP priors are specified by choosing a covariance function (along with its hyperparameters), a choice that is not only challenging in practice, but also has a profound impact on performance. This issue is typically overcome using hierarchical models, i.e., by learning a distribution over covariance functions/hyperparameters that defines a mixture of GPs. Yet, since choosing priors for hyperparameters can be challenging, maximum likelihood is often used instead to obtain point estimates. This approach, however, involves solving a non-convex optimization problem and is thus prone to overfitting. To address these issues, this work proposes a hybrid Bayesian-optimization solution in which the hyperparameters posterior distribution is obtained not using Bayes rule, but as the solution of a mathematical program. Explicitly, we obtain the hyperparameter distribution that minimizes a risk measure induced by the GP mixture. Previous knowledge, including properties such as sparsity and maximum entropy, is incorporated through (possibly non-convex) penalties instead of a prior. We prove that despite its infinite dimensionality and potential non-convexity, this problem can be solved exactly using duality and stochastic optimization.