Soup Kitchen: Mixing Exotic Model Soups across Labels, Losses, and Data

Anthony Fuller; James R Green; Evan Shelhamer

Soup Kitchen: Mixing Exotic Model Soups across Labels, Losses, and Data

Anthony Fuller, James R Green, Evan Shelhamer

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: soups, transfer, generalization, self-supervision

TL;DR: We fine-tune and mix novel model soups of ingredients from supervised and self-supervised learning across different losses, data, and tasks to show they are possible and positively improve predictions.

Abstract: Model soups split a model into multiple models (by fine-tuning) then merge them back into one model (by mixing) to improve accuracy, robustness, and more. How to fine-tune and mix these multiple models, or ingredients, deserves closer examination to keep turning more train-time computation into more improvement. In this work we fine-tune novel ingredients and analyze their mixtures to produce more exotic soups for visual recognition that nevertheless work. For a soup to be possible, the ingredients are known to require a common initialization for fine-tuning, but they vary in their fine-tuning configurations. In existing soups, ingredients vary in their optimization noise, hyperparameters, datasets, and output rewards or input perturbations. However, all known soups are mixed from supervised ingredients with the same loss on labeled data. We show for the first time that 1. ingredients can be fine-tuned without labels by self-supervision and vary across self-supervision hyperparameters (e.g. masking rate), 2. soups can be mixed across supervised losses and self-supervised losses (e.g. MAE and MoCoV3), 3. soups can be mixed across tasks and partitions of the training data, and 4. ingredients fine-tuned by self-supervision on the test data are possible and improve predictions. Our exotic soups provide $1–3\%$ improvements on ImageNet variants and up to $10\%$ improvement on VTAB with remarkable consistency across our novel ingredients from self-supervising, partitioning, and adapting to the test data.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 24742

Loading