Empirical Robustness of Pixel Diffusion Undermines Adversarial Perturbation as Protection against Diffusion-based Mimicry
Keywords: diffusion model;style mimicry;protection;safety
Abstract: Diffusion models have demonstrated impressive abilities in image editing and imitation, raising growing concerns about the protection of private property. A common defense strategy is to apply adversarial perturbations that can mislead a diffusion model into generating bad-quality images. However, existing research has almost entirely focused on latent diffusion models while overlooking pixel-space diffusion models. Through extensive experiments, we show that nearly all attacks designed for latent diffusion models, as well as adaptive attacks aimed at pixel-space diffusion models, fail to compromise the latter. Our analysis suggests that the weakness of latent diffusion models arises mainly from their encoder, whereas pixel-space diffusion models exhibit strong empirical robustness to adversarial perturbations. We further demonstrate that pixel-space diffusion models can serve as an effective purifier by removing adversarial patterns generated for latent diffusion models and preserving image integrity, which in turn allows them to bypass most existing protection schemes. These findings challenge the assumption that adversarial perturbations provide reliable protection for diffusion models and call for a reevaluation of their role as a protection mechanism.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 14428
Loading