Keywords: Model Merging, Multi-task Learning, Task Arithmetic
Abstract: Model merging has emerged as a promising technique for integrating multiple fine-tuned models into a single unified model without additional training. This paradigm is particularly appealing in resource-constrained scenarios where access to data or retraining is limited. Existing techniques—such as Task Arithmetic, Ties-Merging, and AdaMerging—achieve competitive results but typically rely on extensive hyperparameter tuning, which can be prohibitively expensive for large-scale models. In this work, we propose a hyperparameter-robust merging method that reframes the problem as the estimation of a unified task vector that captures the principal directions of each task (i.e., dominant singular vectors). We formalize this process as the Gram-weighted Mahalanobis Fréchet mean (GMF-Mean), a convex optimization problem that admits a closed-form solution. Our theoretical analysis shows that GMF-Mean inherently adapts to both orthogonal (non-interfering) and conflicting (collinear but opposing) task interactions by automatically modulating the magnitudes of the principal directions. This property alleviates the need for costly hyperparameter tuning that is commonly required in Task Arithmetic-based methods. Empirical results on vision, language, and vision-language models show that GMF Mean achieves competitive performance compared to state-of-the-art baselines, while maintaining the advantages of being training-free, data-free, and hyperparameter-robust. These properties position GMF-Mean as a robust solution for real-world deployment.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 10810
Loading