Fast Direct Methods for Gaussian Processes.

Sivaram Ambikasaran, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, Michael O'Neil

2016 (modified: 10 Nov 2022)IEEE Trans. Pattern Anal. Mach. Intell.2016Readers: Everyone

Abstract: A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the <inline-formula><tex-math>$n$</tex-math></inline-formula> -dimensional setting, however, it requires the inversion of an <inline-formula><tex-math>$n \times n$</tex-math></inline-formula> covariance matrix, <inline-formula><tex-math>$C$</tex-math></inline-formula> , as well as the evaluation of its determinant, <inline-formula><tex-math>$\det (C)$ </tex-math></inline-formula> . In many cases, such as regression using Gaussian processes, the covariance matrix is of the form <inline-formula><tex-math>$C = \sigma ^2 I + K$</tex-math> </inline-formula> , where <inline-formula><tex-math>$K$</tex-math></inline-formula> is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix <inline-formula> <tex-math>$C$</tex-math> </inline-formula> is typically dense, causing standard direct methods for inversion and determinant evaluation to require <inline-formula><tex-math>$\mathcal {O}(n^3)$</tex-math></inline-formula> work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix <inline-formula><tex-math>$C$</tex-math></inline-formula> can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an <inline-formula><tex-math>$\mathcal {O} (n\,\log^2\, n)$</tex-math></inline-formula> algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant <inline-formula><tex-math>$\det (C)$</tex-math></inline-formula> , permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining <inline-formula><tex-math>$K$</tex-math></inline-formula> . Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.

0 Replies