MASS: MoErging through Adaptive Subspace Selection

Published: 23 Sept 2025, Last Modified: 17 Nov 2025UniReps2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Extended Abstract Track
Keywords: model merging, routing, task vectors
TL;DR: MASS adaptively merges fine-tuned models without data or retraining, achieving near–fine-tuned accuracy at a fraction of the storage cost.
Abstract: Model merging has emerged as a lightweight alternative to ensembling, combining multiple fine-tuned models into a single set of parameters without additional training. However, existing methods rarely match the accuracy of individually fine-tuned models. We introduce MASS (MoErging through Adaptive Subspace Selection), a training-free approach that narrows this gap while maintaining near state-of-the-art performance across tasks. MASS leverages low-rank decompositions of task-specific updates, storing only the most salient singular components and merging them into a shared model. At inference, a data-free, non-parametric router selects the most relevant subspace (or combination of subspaces) based on intermediate features. This adds only a two-pass inference overhead and a ~2x storage cost relative to a single pretrained model, regardless of the number of tasks. Evaluated on CLIP-based image classification with ViT-B-16, ViT-B-32, and ViT-L-14 across 8, 14, and 20 tasks, MASS achieves up to ~98% of the accuracy of separate fine-tuned models, establishing a new state-of-the-art while remaining far more storage-efficient than ensembling.
Submission Number: 84
Loading