Evading Protections Against Unauthorized Data Usage via Limited Fine-tuning

TMLR Paper7386 Authors

06 Feb 2026 (modified: 18 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text-to-image diffusion models, such as Stable Diffusion, have demonstrated exceptional potential for generating high-quality images. However, recent studies have raised concerns about the use of unauthorized data to train these models, which can lead to intellectual property infringement or privacy violations. A promising approach to mitigating these issues is to embed a signature in the model that can be detected or verified from its generated images. Existing works also aim to prevent training on protected images by degrading generation quality, for example by injecting adversarial perturbations into the training data. In this paper, we propose RATTAN, which effectively evades such protection methods by removing protective perturbations from images and inducing catastrophic forgetting of the corresponding learned features in the model. RATTAN leverages the diffusion process to generate controlled images from the protected inputs, preserving high-level features while ignoring the low-level details used by the embedded pattern. A small number of generated images (e.g., 10) are then used to fine-tune a marked model to remove the learned features. Our experiments on four datasets, two different IP protection methods, and 300 text-to-image diffusion models reveal that, while some protections already suffer from weak memorization, RATTAN can reliably bypass stronger defenses, exposing fundamental limitations of current protections and highlighting the need for stronger defenses.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Michele_Caprio1
Submission Number: 7386
Loading