Keywords: Adversarial advertisement, text-to-image diffusion models, unnoticeable advertisement, robust advertisement, heavy-tailed phase-type distribution, mollifier theory
TL;DR: Adversarial advertisement framework for attacking text-to-image diffusion models to generate images with the desired advertisement content in a robust manner
Abstract: As text-to-image diffusion models (T2I DMs) gain popularity, there is a growing interest in adversarial advertisement where an attacker can compromise a T2I DM and make it generate images with the implantation of the target product brands, based on users' non-advertising input prompts. However, two challenging problems in adversarial advertisement in T2I DMs remain unsolved: imperceptible adversarial advertisement and robust adversarial advertisement. First, an estimation algorithm of multivariate continuously scaled phase-type with Lévy distribution is designed to understand the intrinsic distribution of natural sentences. By pushing non-advertising prompts to dense regions onto the estimated distribution, the perturbed prompts become indistinguishable from natural prompts with the advertisements. Theoretical analysis is conducted to validate its convergence to the empirical distribution of natural prompts with advertisements. Second, a novel masked parameter smoothing method based on mollification theory is developed to derive a smooth T2I DM with a dimension-invariant certified guarantee for adversarial-advertisement robustness against model fine-tuning in high-dimensional parameter space, while the masked smoothing can reduce the loss of model utility. Theoretical analysis shows that smooth T2I DMs can still yield adversarial advertisements against model fine-tuning within the certified radius.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 21412
Loading