Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry

Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry

TMLR Paper4772 Authors

01 May 2025 (modified: 04 Jul 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Diffusion models have demonstrated a remarkable capability to edit or imitate images, which has raised concerns regarding the safeguarding of intellectual property. To address these concerns, the adoption of adversarial attacks, which introduce adversarial perturbations that can fool the targeted diffusion model into protected images, has emerged as a viable solution. Consequently, diffusion models, like many other deep network models, are believed to be susceptible to adversarial attacks. However, in this work, we draw attention to an important oversight in existing research, as all previous studies have focused solely on attacking latent diffusion models (LDMs), neglecting adversarial examples for diffusion models in the pixel space diffusion models (PDMs). Through extensive experiments, we demonstrate that nearly all existing adversarial attack methods designed for LDMs, as well as adaptive attacks designed for PDMs, fail when applied to PDMs. We attribute the vulnerability of LDMs to their encoders, indicating that diffusion models exhibit strong robustness against adversarial attacks. Building upon this insight, we find that PDMs can be used as an off-the-shelf purifier to effectively eliminate adversarial patterns generated by LDMs, thereby maintaining the integrity of images. Notably, we highlight that most existing protection methods can be easily bypassed using PDM-based purification. We hope our findings prompt a reevaluation of adversarial samples for diffusion models as potential protection methods.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Dit-Yan_Yeung2

Submission Number: 4772

Loading