How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

Published: 26 May 2026, Last Modified: 15 Jun 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian methods, deep Gaussian processes, compositional priors, depth limits
TL;DR: We give a sharp bandwidth threshold r_c(d) for compositional GP priors: above it the prior collapses onto constants, while below it the prior in fact has a non-degenerate, non-Gaussian depth-infinite limit producing complex multimodal samples.
Abstract: Compositional priors describe the generic properties of layered functions in deep Bayesian models, where deep neural networks with random weights are a canonical example. In the wide-network limit, the prior is a Gaussian process (GP) with a depth-dependent kernel, and its behaviour as depth grows has been extensively studied through this kernel. Here, we study another case, where each layer itself is a vector valued Gaussian process, and our aim is similarly to understand the structure induced by these priors as depth grows. Previous GP work has established that for the radial basis function (RBF) kernel and a certain range of bandwidths $r$, the prior degenerates in the limit, converging to the set of constant functions --- which is not useful as a probabilistic or a generative model. In this paper we establish several new results. First, we identify a sharp bandwidth threshold $r_c(d) = \Theta(\sqrt{d})$ above which the limit is degenerate, strengthening the earlier bounds. Second, and more importantly, we show that for $r$ below the threshold $r_c(d)$ the prior converges to a limit distribution $\pi_{\bar{Z}}$. We also prove that these distributions are non-degenerate and non-Gaussian, with non-vanishing dependence between coordinates. In contrast to the previously known degenerate regime, deep Gaussian process priors can therefore admit non-trivial limits. Empirically, we verify the threshold across a range of dimensions $d$, and demonstrate that the limit distributions $\pi_{\bar{Z}}$ are capable of generating complex multimodal samples --- a regime that becomes increasingly narrow with $d$ and would be hard to identify without knowing the threshold.
Submission Number: 94
Loading