Understanding the Role of Functional Diversity in Weight-Ensembling with Ingredient Selection and Multidimensional Scaling
Keywords: Weight ensembling, model souping, functional diversity, loss basin
TL;DR: We develop a weight-averaging technique which is next-step optimal when adding components that allows us to explore how functional and positional diversity relate to selection and in turn improve a weight average.
Abstract: Weight-ensembled models, formed when the parameters of multiple neural networks are directly averaged into a single model, have demonstrated a generalization capability in-distribution (ID) and out-of-distribution (OOD) which is not completely understood, though weight-ensembles are thought to successfully exploit functional diversity allotted by each distinct model. Given a collection of models, it is also unclear which combination leads to the optimal weight-ensemble; the SOTA is a linear-time ``greedy" method. We introduce two novel methods with targeted model-selection mechanisms to study the link between method-performance dynamics and the nature of how each method decides to use apply the functionally diverse components. We develop a visualization tool to explain how each algorithm explores various domains defined via pairwise-distances to further investigate selection and algorithms' convergence. Empirical analyses shed perspectives which reinforce how high-diversity enhances weight-ensembling while qualifying the extent to which diversity alone improves accuracy and demonstrate that sampling positionally distinct models can contribute just as meaningfully to improvements in a weight-ensemble.
Submission Number: 121
Loading