EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second

Published: 20 Jul 2024, Last Modified: 29 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text-to-image (T2I) diffusion models enjoy great popularity and many individuals and companies build their applications based on publicly released T2I diffusion models. Previous studies have demonstrated that backdoor attacks can elicit T2I diffusion models to generate unsafe target images through textual triggers. However, existing backdoor attacks typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance of T2I diffusion models. To address these issues, we propose EvilEdit, a **training-free** and **data-free** backdoor attack against T2I diffusion models. EvilEdit directly edits the projection matrices in the cross-attention layers to achieve projection alignment between a trigger and the corresponding backdoor target. We preserve the functionality of the backdoored model using a protected whitelist to ensure the semantic of non-trigger words is not accidentally altered by the backdoor. We also propose a visual target attack EvilEdit$_{VTA}$, enabling adversaries to use specific images as backdoor targets. We conduct empirical experiments on Stable Diffusion and the results demonstrate that the EvilEdit can backdoor T2I diffusion models within **one second** with up to 100% success rate. Furthermore, our EvilEdit modifies only 2.2% of the parameters and maintains the model’s performance on benign prompts. Our code is available at [https://github.com/haowang-cqu/EvilEdit](https://github.com/haowang-cqu/EvilEdit).
Primary Subject Area: [Generation] Social Aspects of Generative AI
Secondary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: Our work represents a groundbreaking advancement in the security domain of multimedia and multimodal processing, specifically targeting the robustness of text-to-image (T2I) diffusion models. As T2I models gain widespread adoption for generating images from natural language descriptions, ensuring their security against malicious manipulations becomes crucial. Our work directly addresses this emerging challenge by introducing a novel backdoor attack mechanism, EvilEdit, which leverages a training-free model editing approach to embed backdoors into T2I models. This method not only underscores the potential vulnerabilities in widely-used models like Stable Diffusion but also sets a precedent for considering security in multimedia model development and usage. The significance of EvilEdit to the ACM Multimedia community lies in its novel approach to model editing without the need for extensive training or fine-tuning, thereby offering a faster, more efficient means of demonstrating potential security flaws. Furthermore, our empirical experiments validate the effectiveness of EvilEdit in compromising T2I models without degrading their performance, highlighting the need for comprehensive security measures in multimodal processing systems. This contribution is pivotal for the community, as it not only illuminates a critical vulnerability in multimedia processing but also paves the way for future research into secure model development and deployment.
Supplementary Material: zip
Submission Number: 152
Loading