Bayesian Basis Function Approximation for Scalable Gaussian Process Priors in Deep Generative Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: High-dimensional time-series datasets are common in domains such as healthcare and economics. Variational autoencoder (VAE) models, where latent variables are modeled with a Gaussian process (GP) prior, have become a prominent model class to analyze such correlated datasets. However, their applications are challenged by the inherent cubic time complexity that requires specific GP approximation techniques, as well as the general challenge of modeling both shared and individual-specific correlations across time. Though inducing points enhance GP prior VAE scalability, optimizing them remains challenging, especially since discrete covariates resist gradient‑based methods. In this work, we propose a scalable basis function approximation technique for GP prior VAEs that mitigates these challenges and results in linear time complexity, with a global parametrization that eliminates the need for amortized variational inference and the associated amortization gap, making it well-suited for conditional generation tasks where accuracy and efficiency are crucial. Empirical evaluations on synthetic and real-world benchmark datasets demonstrate that our approach not only improves scalability and interpretability but also drastically enhances predictive performance.
Lay Summary: In many areas, including healthcare, we collect vast amounts of time-stamped data, like patients’ vital signs over time. Traditional tools for analyzing this kind of data either compare every point to every other (which becomes very slow as datasets grow) or simplify the problem by picking a few representative examples (which can miss important differences, especially when tracking many different groups). In our work, we propose a more efficient way to capture how data evolves. We use a set of simple mathematical building blocks, basis functions, to represent smooth patterns in the data instead of comparing everything or picking key samples. This approach keeps the computation growing at the same rate as the dataset itself, even when it includes many different types of subjects. We also simplify the learning process by using a single shared model for the entire dataset which helps us to avoid errors introduced by extra estimation steps. Our method provided more accurate predictions than previous approaches when tested on both simulated and real-world datasets. This makes it especially useful for tasks that require generating or forecasting time-series data quickly and reliably.
Link To Code: https://github.com/YigitBalik/DGBFGP
Primary Area: Deep Learning->Sequential Models, Time series
Keywords: Gaussian Process, High-dimensional time-series, Variational Autoencoder, Conditional Generation, Reproducing Kernel Hilbert Space
Submission Number: 6596
Loading