Deep Ensemble Kernel LearningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: kernel-learning, gaussian-process, Bayesian, ensemble
Abstract: Gaussian processes (GPs) are nonparametric Bayesian models that are both flexible and robust to overfitting. One of the main challenges of GP methods is selecting the kernel. In the deep kernel learning (DKL) paradigm, a deep neural network or ``feature network'' is used to map inputs into a latent feature space, where a GP with a ``base kernel'' acts; the resulting model is then trained in an end-to-end fashion. In this work, we introduce the ``deep ensemble kernel learning'' (DEKL) model, which is a special case of DKL. In DEKL, a linear base kernel is used, enabling exact optimization of the base kernel hyperparameters and a scalable inference method that does not require approximation by inducing points. We also represent the feature network as a concatenation of an ensemble of learner networks with a common architecture, allowing for easy model parallelism. We show that DEKL is able to approximate any kernel if the number of learners in the ensemble is arbitrarily large. Comparing the DEKL model to DKL and deep ensemble (DE) baselines on both synthetic and real-world regression tasks, we find that DEKL often outperforms both baselines in terms of predictive performance and that the DEKL learners tend to be more diverse (i.e., less correlated with one another) compared to the DE learners.
One-sentence Summary: We present a joint training method for neural network ensembles using deep kernel learning with a linear kernel, derive an efficient variational inference framework for it, and show that it is a universal kernel approximator.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=BSEbOec40c
9 Replies

Loading