Gaussian Process Latent Factor Regression for Low-Data, High-Dimensional Output Problems

Published: 25 May 2026, Last Modified: 25 May 2026ProbML 2026 Workshop TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We validate the method on a synthetic benchmark and demonstrate its potential by constructing the first spatially resolved emulator of global climate models for rocky exoplanets.
Keywords: Gaussian processes, latent factor regression, high-dimensional outputs, low-data learning, emulation, exoplanets, climate modeling
TLDR: We introduce GPLFR, a latent factor model with GP priors designed for predicting high-dimensional outputs from few training examples, and use it to build the first spatially resolved emulator of global climate models for rocky exoplanets.
Submission Number: 13
Loading