Sparse MoEs meet Efficient Ensembles

James Urquhart Allingham; Florian Wenzel; Zelda E Mariet; Basil Mustafa; Joan Puigcerver; Neil Houlsby; Ghassen Jerfel; Vincent Fortuin; Balaji Lakshminarayanan; Jasper Snoek; Dustin Tran; Carlos Riquelme Ruiz; Rodolphe Jenatton

Sparse MoEs meet Efficient Ensembles

James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: Ensembles, Sparse MoEs, Robustness, Uncertainty Calibration, OOD detection, Efficient Ensembles, Large scale, Computer vision

Abstract: Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that these two approaches have complementary features whose combination is beneficial. Then, we present partitioned batch ensembles, an efficient ensemble of sparse MoEs that takes the best of both classes of models. Extensive experiments on fine-tuned vision transformers demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty calibration improvements of our approach over several challenging baselines. Partitioned batch ensembles not only scale to models with up to 2.7B parameters, but also provide larger performance gains for larger models.

One-sentence Summary: We analyse and combine sparse MoE models with ensembles, to better understand the interplay between these two kinds of models, resulting in a new algorithm that provides the best of both worlds for vision transformers with up to 2.7B parameters.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/sparse-moes-meet-efficient-ensembles/code)

19 Replies

Loading