Keywords: Text-to-Image Models, Copyright Infringement, Watermarking
Abstract: Text-to-image diffusion models, such as Stable Diffusion, have demonstrated exceptional potential in generating high-quality images. However, recent studies highlight concerns about the use of unauthorized data in training these models, which can lead to intellectual property infringement or privacy violations. A promising approach to mitigating these issues is to embed a signature in the model that can be detected or verified from its generated images. Existing works also aim to fully prevent training on protected images by degrading generation quality, achieved by injecting adversarial perturbations onto training data. In this paper, we propose RATTAN, which effectively evades such protection methods by removing the protective perturbations from images and catastrophically forgetting such learned features in a model. It leverages the diffusion process for controlled image generation on the protected input, preserving high-level features while ignoring the low-level details utilized by the embedded pattern. A small number of our generated images (e.g., 10) are then used to fine-tune marked models to remove the learned features. Our experiments on four datasets, two different IP protection methods, and 300 text-to-image diffusion models reveal that while some protections already suffer from weak memorization, RATTAN can reliably bypass stronger defenses, exposing fundamental limitations of current protections and highlighting the need for stronger defenses.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 20155
Loading