Keywords: Model Merging, Domain Generalization, Robustness, Foundation Model
Abstract: Generalization to distribution shifts is a primary goal in modern machine learning literature. Ensemble methods, including both output-space ensemble and weight-space ensemble (model merging), are renowned for their robust generalization capabilities over multi-task settings, leveraging the diverse features from source models to improve cross-task transferability. While most studies on model merging focus on constructing diverse pools of task vectors obtained from foundation models trained on different tasks, we also emphasize the quality of each source. In this paper, we introduce a novel method for selectively merging task vectors to achieve superior generalization on target domains. Our approach uniquely considers both the diversity and quality of individual models. Using Determinantal Point Processes (DPP), we propose a probabilistic framework that optimally selects which models to average in a plug-and-play manner, ensuring a balanced consideration of quality and diversity. Theoretical support is provided for our hypothesis that this dual consideration yields a tighter generalization error bound for the unified model. Empirically, we present experiments in an out-of-distribution setting where there is significant violation in identically distributed conditions between the source and target domains.
Submission Number: 16
Loading