Keywords: Deep Gaussian Processes, Optimization, Mode Collapse, Whitening, Initialization
TL;DR: We analyze the possible causes of mode collapse (where, during training, the variational posterior becomes the prior distribution) in DGPs and propose a solution to avoid it.
Abstract: Deep Gaussian Processes (DGPs) define a hierarchical model capable of learning complex, non-stationary processes. Exact inference is intractable in DGPs, so a variational distribution is used in each layer. One of the main challenges when training DGPs is the prevention of a phenomenon known as mode collapse where, during training, the variational distribution becomes the prior distribution which is a minimizer of the KL-Divergence term in the ELBO. There are two main factors that influence the optimization process: the mean function of the inner GPs and the usage of the whitened representation of the variational distribution. In this work, we propose a data-driven initialization of the variational parameters that a) at initialization, predicts an already good approximation of the objective function, b) avoids mode collapse c) is supported by a theoretical analysis of the behavior of the KL divergence and experimental results in real-world datasets.
Submission Number: 53
Loading