Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

TMLR Paper2876 Authors

15 Jun 2024 (modified: 21 Nov 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, which yet has not been sufficiently explored. One fundamental question is whether the existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world perturbations. Unlike malicious attacks that involve apocryphal alterations to the input texts, we consider a perturbation space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make and develop adversarial attacks to generate worst-case perturbations for robustness evaluation. Given the inherent randomness of the generation process, we design four novel distribution-based objectives to mislead T2I DMs. We optimize the objectives in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we also offer an in-depth analysis to show our method is not specialized for solely attacking the text encoder in T2I DMs.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Yan_Liu1

Submission Number: 2876

Loading