Keywords: model merging, model stealing, privacy, security
Abstract: Model merging is a promising technique to enhance the capabilities of neural networks (NNs) by integrating multiple downstream fine-tuned models without requiring access to clients' raw data or substantial computation resources. However, conventional model merging typically requires collecting the full set of fine-tuned model parameters from multiple clients, which may expose them to model-privacy risks. An emerging approach, known as partial model merging (PMM), mitigates this risk by splitting each model into private and shared parts, where only the shared part is merged while the private part remains local to each client. Despite its stricter parameter fusion, PMM can still achieve competitive performance compared to full-parameter sharing. However, the privacy properties of PMM remain underexplored. In this paper, we propose a novel model stealing attack and assess the risk of reconstructing the unshared private part of a partially merged model under eight attack scenarios with varying prior knowledge (i.e., partial training data, model parameters and/or model structure). Our comprehensive experiments reveal that merging NNs without adequate protection is highly vulnerable. Even when only a small fraction of training data, model parameters, or model structure is exposed, adversaries can still recover significant portions of the private model's performance.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 5466
Loading