Why Are Bootstrapped Deep Ensembles Not Better?Download PDF

Published: 09 Dec 2020, Last Modified: 05 May 2023ICBINB 2020 PosterReaders: Everyone
Keywords: Deep Learning, Ensembles, Uncertainty Estimation
TL;DR: We investigate a major hypothesis for the weak performance of bootstrap ensembles - dataset size - and find that the number of unique datapoints in the training set is the main determinant of ensemble performance.
Abstract: Ensemble methods have consistently reached state of the art across predictive, uncertainty, and out-of-distribution robustness benchmarks. One of the most popular ways to construct an ensemble is to independently train each model on are sampled (bootstrapped) version of the dataset. Bootstrapping is popular in the literature on decision trees and frequentist statistics, with strong theoretical guarantees, but it is not used often in practice for deep neural networks. We investigate a common hypothesis for bootstrap’s weak performance—percentage of unique points in the subsampled dataset—and find that even when adjusting for it, boot-strap ensembles of deep neural networks yield no benefit over simpler baselines.This brings to question the role of data randomization as a source of uncertainty in deep learning.
1 Reply

Loading