Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Pierre Ablin; Angelos Katharopoulos; Skyler Seto; David Grangier

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David Grangier

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a new architecture that, given a specialist dataset, is able to quickly instantiate a small specialist model that is good on that dataset.

Abstract: Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many different specialist models quickly under a size constraint.

Lay Summary: We propose a new neural network architecture that holds many parameters that are trained jointly. Unlike standard architecture, when we want to use the model to address a new task, we first select a relevant small subset of the parameters of the model, and then use only these parameters to address the new task. Since each task requires a small number of parameters, the models are very efficient.

Primary Area: General Machine Learning->Scalable Algorithms

Keywords: Pre-training, Specialization, Domain Adaptation, small models

Submission Number: 12107

Loading