A Change of Heart: Backdoor Attacks on Security-Centric Diffusion Models

Changjiang Li; Ren Pang; Bochuan Cao; Jinghui Chen; Ting Wang

A Change of Heart: Backdoor Attacks on Security-Centric Diffusion Models

Changjiang Li, Ren Pang, Bochuan Cao, Jinghui Chen, Ting Wang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Diffusion Models, Adversarial Purification, Backdoor Attacks

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: DIFF2 is a backdoor attack on diffusion models, highlighting risks in their use for critical applications (e.g., adversarial purification and robustness certification)

Abstract: Diffusion models have been employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. Meanwhile, the prohibitive training costs often make the use of pre-trained diffusion models an attractive practice. The tension between the intended use of these models and their unvalidated nature raises significant security concerns that remain largely unexplored. To bridge this gap, we present DIFF2, a novel backdoor attack tailored to security-centric diffusion models. Essentially, DIFF2 superimposes a diffusion model with a malicious diffusion-denoising process, guiding inputs embedded with specific triggers toward an adversary-defined distribution, while preserving the normal process for other inputs. Our case studies on adversarial purification and robustness certification show that DIFF2 substantially diminishes both post-purification and certified accuracy across various benchmark datasets and diffusion models, highlighting the potential risks of utilizing pre-trained diffusion models as defensive tools. We further explore possible countermeasures, suggesting promising avenues for future research.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6975

Loading