Theoretical Limitations of Ensembles in the Age of Overparameterization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We theoretically characterize the generalization and uncertainty properties of overparameterized random feature regressors, proving a functional equivalence between ensembles and single (but larger) models under weak assumptions.
Abstract: Classic ensembles generalize better than any single component model. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove with minimal assumptions that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors, and finite width ensembles rapidly converge to single models with the same parameter budget. These results, which are exact for ridgeless models and approximate for small ridge penalties, imply that overparameterized ensembles and single large models exhibit nearly identical generalization. We further characterize the predictive variance amongst ensemble members, demonstrating that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.
Lay Summary: In safety-critical applications like medical diagnosis or self-driving cars, researchers often combine multiple AI models into so-called "ensembles" to improve predictions – similar to consulting a committee rather than a single expert. This approach has worked well for simple models, but with today's powerful neural networks that can memorize entire datasets, ensembles often fail to deliver the expected benefits. We analyzed this mathematically using simplified neural networks. We discovered that when models are complex enough to memorize their training data, ensembles of them closely behave like a single, larger model. This means ensembling large models offers little gain over simply training a single, bigger model. Furthermore, we found that a common method for estimating the uncertainty of ensemble predictions – measuring disagreement among ensemble members – lacks theoretical grounding in such cases. Our results don't deny that ensembles can still be useful in practice since larger models might, for example, be hard to train. However, they caution against viewing ensembles as a simple and reliable strategy for boosting performance over what a single larger model could achieve or for assessing uncertainty.
Link To Code: https://github.com/nic-dern/theoretical-limitations-overparameterized-ensembles
Primary Area: Theory->Learning Theory
Keywords: Ensembles, Deep Ensembles, Uncertainty Quantification, Overparameterization, Random feature regression, Kernel regression
Submission Number: 7902
Loading