Keywords: Model Merging, Mixup Interpolation, Large Language Models (LLMs)
TL;DR: Mixup Model Merge (M³) is a simple yet effective method that improves model merging by performing randomized parameter interpolation between task-specific LLMs using a Beta distribution to optimize contribution ratios.
Abstract: Model merging aims to integrate multiple task-specific models into a unified model that inherits the capabilities of the task-specific models without additional training. Existing model merging methods often lack consideration of the varying contribution ratios of different task-specific models to the final merged model. In this paper, we propose Mixup Model Merge ($M^3$), a simple yet effective method inspired by the randomized linear interpolation strategy from the Mixup data augmentation technique. $M^3$ performs randomized linear interpolation in parameter space between two task-specific LLMs, where interpolation coefficients are sampled from a Beta distribution to explore diverse contribution ratios. This controllable randomness allows $M^3$ to outperform standard equal-ratio merging by discovering better contribution ratio combinations. Extensive experiments show that $M^3$ significantly: improves merged LLM performance across tasks; enhances out-of-distribution and adversarial robustness; outperforms the positive effects of the sparsification method DARE on model merging and can be further combined with DARE to achieve superior results; and by tuning the Beta distribution’s shape parameters, $M^3$ balances exploration efficiency and diversity in contribution ratios. The code is provided in the supplementary materials.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8169
Loading