Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2
Abstract: Satellite remote sensing provides a global view to
processes on Earth that has unique benefits compared to mak-
ing measurements on the ground, such as global coverage
and enormous data volume. The typical downsides are spatial
and temporal gaps and potentially low data quality. Meaning-
ful statistical inference from such data requires overcoming
these problems and developing efficient and robust compu-
tational tools. We design and implement a computationally
efficient multi-scale Gaussian process (GP) software pack-
age, satGP, geared towards remote sensing applications. The
software is able to handle problems of enormous sizes and to
compute marginals and sample from the random field condi-
tioning on at least hundreds of millions of observations. This
is achieved by optimizing the computation by, e.g., random-
ization and splitting the problem into parallel local subprob-
lems which aggressively discard uninformative data.
We describe the mean function of the Gaussian process by
approximating marginals of a Markov random field (MRF).
Variability around the mean is modeled with a multi-scale co-
variance kernel, which consists of Matérn, exponential, and
periodic components. We also demonstrate how winds can be
used to inform covariances locally. The covariance kernel pa-
rameters are learned by calculating an approximate marginal
maximum likelihood estimate, and the validity of both the
multi-scale approach and the method used to learn the kernel
parameters is verified in synthetic experiments.
We apply these techniques to a moderate size ozone data
set produced by an atmospheric chemistry model and to the
very large number of observations retrieved from the Orbit-
ing Carbon Observatory 2 (OCO-2) satellite. The satGP soft-
ware is released under an open-source license.
Loading