MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

Piotr Teterwak; Nikoli Dryden; Dina Bashkirova; Kate Saenko; Bryan A. Plummer

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Enesembles, Robust Learning, Efficient Computing

Abstract: Ensembles are a very effective way of increasing both the robustness and accuracy of a learning system. Yet they are memory and compute intensive; in a naive ensemble, $n$ networks are trained independently and $n$ networks must be stored. Recently, BatchEnsemble \citep{wen2020batchensemble} and MIMO \citep{havasi2020training} has significantly decreased the memory footprint with classification performance that approaches that of a naive ensemble. We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates. Then, each ensemble member is defined as a different set of linear combination weights. By modulating the number of templates available, MixtureEnsembles are uniquely flexible and allow easy scaling between the low-parameter and high-parameter regime. In the low parameter regime, MixtureEnsembles outperforms BatchEnsemble on both ImageNet and CIFAR, and are competitive with MIMO. In the high-parameter regime, MixtureEnsembles outperform all baselines on CIFAR and ImageNet. This flexibility allows users to control the precise performance-memory cost trade-off without making any changes in the backbone architecture. When we additionally tune the backbone architecture width, we can outperform all baselines in the low-parameter regime with the same inference FLOP footprint.

One-sentence Summary: Parameter efficient, performant, and flexible ensembles.

Supplementary Material: zip

5 Replies

Loading