Sensitivity as a Shield: Inducing Sensitivity to Prevent Unauthorized Model Merging

ICLR 2026 Conference Submission15232 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models. model merging. unmergeable
TL;DR: This paper proposes a new method to prevent unauthorized model merging.
Abstract: Training large language models (LLMs) from scratch is costly, driving interest in leveraging open-source LLMs for domain-specific tasks without additional training. Model merging has emerged as a solution to integrate knowledge from fine-tuned models efficiently, but it raises security concerns on unauthorized model merging. Existing approaches primarily focus on post-hoc mechanisms to detect malicious exploitation of released models. In contrast, we propose a novel paradigm: safeguarding models against unauthorized merging before misuse occurs. Specifically, after training a model with strong capabilities in a specific domain, we propose an unmergeable}method that preserves a model’s domain-specific performance while preventing malicious users from acquiring its capabilities through model merging. We identify the critical role of neuron-sensitive weight regions in enabling unmerging and propose two complementary operations, global and local sensitivity processing, to enforce protection. Both theoretical analysis and empirical evaluations demonstrate the effectiveness of our approach in maintaining task performance while making models resistant to unauthorized merging.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15232
Loading