How to Backdoor Diffusion Models?Download PDF

Published: 04 Mar 2023, Last Modified: 26 Mar 2024ICLR 2023 BANDS OralReaders: Everyone
Keywords: backdoor, diffusion model, trustworthy
TL;DR: Proposed a new backdoor attack on diffusion models.
Abstract: Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose $\textbf{BadDiffusion}$, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that $\textbf{BadDiffusion}$ can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, $\textbf{BadDiffusion}$ can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models. Our code is available on https://github.com/IBM/BadDiffusion.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2212.05400/code)
0 Replies

Loading