TL;DR: We achieve state-of-the-art model merging results by proposing isotropic merging in common and task-specific subspaces of weight update matrices across vision and language, for fully fine-tuned and LoRA-adapted models
Abstract: Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices -- weight update matrices applied to a pre-trained model -- that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training.
Lay Summary: Imagine having several AI experts, each trained for a specific job (like one identifying cats, another recognizing handwritten numbers). We want to combine these experts into a single AI that can perform all these jobs well. However, simply mixing their knowledge together often results in a combined AI that doesn't excel at any particular task, performing worse than the original experts.
Our research uncovers a key reason why this happens and offers a better way to merge these AI experts. We found that it's crucial how the internal "learnings" or adjustments of each expert align with each other. If they point in compatible directions, the merged AI performs much better. Based on this, we developed a new method. First, we make these internal learnings more "balanced" and uniform (we call this "isotropic"), which helps them align better and boosts performance. Then, we go a step further by carefully preserving not only the common knowledge shared across all tasks but also the unique, specific knowledge from each individual expert.
Our combined approach creates a significantly more capable AI for handling multiple tasks, often achieving top-tier results across various scenarios, all without needing any additional, costly retraining. This helps ensure "no task is left behind" when combining specialized AI models.
Link To Code: https://github.com/danielm1405/iso-merging
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: Model merging
Submission Number: 399
Loading