Representation Surgery in Model Merging with Probabilistic Modeling

Qi Wei; Shuo He; Enneng Yang; Tingcong Liu; Haobo Wang; Lei Feng; Bo An

Representation Surgery in Model Merging with Probabilistic Modeling

Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, Bo An

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Model merging aims to achieve multitask performance by merging multiple expert models without the need to access the raw training data. Recent research identified the \textit{representation bias} of model merging, characterized by a discrepancy in the representation distribution between the merged and individual models, hindering the performance of model merging methods. To mitigate the representation bias, a task-specific MLP, Surgery, was built to model the bias that is subsequently decreased on the merged representation. However, this strategy is still suboptimal due to the limited modeling capability within the deterministic manner. To address this issue, we present ProbSurgery, a probabilistic module specifically designed to accurately model the representation bias. This module generates an embedding distribution for each sample and outputs the representation bias through a sampling process. ProbSurgery offers superior representational capacity by naturally handling the uncertainty resulting from parameter interference of merging multiple models. Besides, we provide a theoretical analysis to reveal the advance of the probabilistic manner and propose an extension of ProSurgery for adapting to the task-sharing setting. Extensive experiments verify the effectiveness of ProbSurgery for representation surgery while maintaining generalization capabilities in real-world scenarios, including out-of-distribution and domain shift challenges.

Lay Summary: When we merge multiple machine learning models to solve several tasks at once, we usually hope to get the benefits of all the individual models — without needing the original data used to train them. But this process often runs into a hidden issue: the combined model may "see" the world differently than the originals, due to something called representation bias. This bias causes the merged model’s internal understanding (its “representations”) to shift in a way that harms performance. Previous work tried to fix this using a simple correction layer, but its effectiveness was limited because it treated the bias in a rigid, fixed way. Our new method, ProbSurgery, tackles the problem differently: it treats the bias from uncertainty and models it using probabilities. This allows our method to better handle the messiness that arises when many models are combined. We also provide a mathematical explanation of why this probabilistic method works better, and we show that it can be extended to more collaborative learning settings. Experiments confirm that ProbSurgery consistently improves merged models across a range of real-world tasks.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning

Keywords: Model Merging, Representation Learning

Submission Number: 2386

Loading