Abstract: Deep neural networks have enhanced face synthesis detection in discriminating Artificial Intelligence Generated Content (AIGC). However, their security is threatened by the injection of carefully crafted triggers during model training (i.e., backdoor attacks). Although existing backdoor defenses and manual data selection are able to mitigate those using human-eye-sensitive triggers, such as patches or adversarial noises, the more challenging natural backdoor triggers remain insufficiently researched. To further investigate natural triggers, we propose a novel analysis-by-synthesis backdoor attack against face synthesis detection models, which embeds natural triggers in the latent space. We study such backdoor vulnerability from two perspectives: (1) Model Discrimination (Optimization-Based Trigger): we adopt a substitute detection model and find the trigger by minimizing the cross-entropy loss; (2) Data Distribution (Custom Trigger): we manipulate the uncommon facial attributes in the long-tailed distribution to generate poisoned samples without the supervision from detection models. Furthermore, to evaluate the detection models toward the latest AIGC, we utilize both the state-of-the-art StyleGAN and Stable Diffusion for trigger generation. Finally, these backdoor triggers introduce specific semantic features to the generated poisoned samples (e.g., skin textures and smile), which are more natural and robust. Extensive experiments show that our method is superior over existing pixel space backdoor attacks on three levels: (1) Attack Success Rate: achieving an attack success rate exceeding 99\(\%\), comparable to baseline methods, with less than 0.1\(\%\) model accuracy drop and under 3\(\%\) poisoning rate; (2) Backdoor Defense: showing superior robustness when faced with existing backdoor defenses (e.g., surpassing baseline methods by over 30\(\%\) after a 15\({}^{\circ}\) rotation); (3) Human Inspection: being less human-eye-sensitive from a user study with 46 participants and a collection of 2,300 data points.
Loading