Distilling Ensembles Improves Uncertainty EstimatesDownload PDF

Published: 21 Dec 2020, Last Modified: 05 May 2023AABI2020Readers: Everyone
Keywords: distillation, uncertainty, ensembles, batch ensembles, weight reconstruction
TL;DR: We show that BatchEnsembles cannot be reconstructed from the weights of deep ensembles, but that distilling deep ensembles to batch ensembles significantly improves their calibration without penalizing accuracy.
Abstract: We seek to bridge the performance gap between batch ensembles (ensembles of deep networks with shared parameters) and deep ensembles on tasks which require not only predictions, but also uncertainty estimates for these predictions. We obtain negative theoretical results on the possibility of approximating deep ensemble weights by batch ensemble weights, and so turn to distillation. Training a batch ensemble on the outputs of deep ensembles improves accuracy and uncertainty estimates, without requiring hyper-parameter tuning. This result is specific to the choice of batch ensemble architectures: distilling deep ensembles to a single network is unsuccessful, despite single networks having only marginally fewer parameters than batch ensembles.
1 Reply

Loading