Keywords: Model Merging, Fusion of Experts, Efficiency, Large Language Models
Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training.
However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues.
Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation.
We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance.
In view of this, we propose Twin-Merging, a method that encompasses two principal stages:
(1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency;
(2) dynamically merging shared and task-specific knowledge based on the input.
This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data.
Extensive experiments on $20$ datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks.
Supplementary Material: zip
Primary Area: Natural language processing
Submission Number: 8827
Loading