Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry

Haotian Xue; Yongxin Chen

Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry

Haotian Xue, Yongxin Chen

Published: 12 Oct 2024, Last Modified: 14 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models; Adversarial Perturbations; Privacy Protection

Abstract: Diffusion models have demonstrated an remarkable capability to edit or imitate images, which has raised concerns regarding the safeguarding of intellectual property. To address these concerns, the adoption of adversarial attacks, which introduce adversarial perturbations that can fool the targeted diffusion model into protected images , has emerged as a viable solution. Consequently, diffusion models, like many other deep network models, are believed to be susceptible to adversarial attacks. However, in this work, we draw attention to an important oversight in existing research, as all previous studies have focused solely on attacking latent diffusion models (LDMs), neglecting adversarial examples for diffusion models in the pixel space (PDMs). Through extensive experiments, we demonstrate that nearly all existing adversarial attack methods designed for LDMs, as well as adaptive attacks designed for PDMs, fail when applied to PDMs. We attribute the vulnerability of LDMs to their encoders, indicating that diffusion models exhibit strong robustness against adversarial attacks. Building upon this insight, we find that PDMs can be used as an off-the-shelf purifier to effectively eliminate adversarial patterns generated by LDMs, thereby maintaining the integrity of images. Notably, we highlight that most existing protection methods can be easily bypassed using PDM-based purification. We hope our findings prompt a reevaluation of adversarial samples for diffusion models as potential protection methods.

Submission Number: 88

Loading