MAP: Model Merging with Amortized Pareto Front Using Limited Computation

Lu Li; Tianyu Zhang; Zhiqi Bu; Suyuchen Wang; Huan He; Jie Fu; Yonghui Wu; Jiang Bian; Yong Chen; Yoshua Bengio

MAP: Model Merging with Amortized Pareto Front Using Limited Computation

Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

Published: 01 Oct 2024, Last Modified: 17 Oct 2024FL@FM-NeurIPS'24 OralEveryoneRevisionsBibTeXCC0 1.0

Keywords: model merging, federated transfer learning with foundation models, multitask learning, task arithmetic, multi-objective optimization, federated learning

TL;DR: We provide a computation-efficient algorithm for finding the Pareto front representing the trade-offs during model merging caused by conflicting objectives between different tasks.

Abstract: Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. However, existing methods focus on enhancing average task accuracy, often neglecting the trade-offs between different tasks. We introduce Model Merging with Amortized Pareto Front (MAP), a novel low-compute algorithm that efficiently identifies a Pareto set of scaling coefficients for merging multiple models. MAP uses a quadratic approximation surrogate model to estimate task metrics, enabling amortized inference. Our approach is particularly valuable in federated learning scenarios, where it can balance performance across diverse client datasets while respecting privacy constraints and minimizing communication overhead. Experimental results on vision and natural language processing tasks demonstrate MAP's ability to accurately identify the Pareto front, offering practitioners a range of solutions with various trade-offs. This makes MAP a promising approach for optimizing multitask performance in both centralized and distributed learning environments, addressing the challenges of task conflicts and privacy preservation in model merging.

Submission Number: 53

Loading