Abstract: Detectors are the core component of DeepFake detection to distinguish between real and fake content, but they are vulnerable to adversarial attacks. Existing attack strategies focus on injecting visible perturbations into forgeries, which may degrade image quality. In this work, we propose an invisible attack to evade Deepfake detectors while maintaining minimal artifacts on human perception. First, we encode the fake image as latent code, which contains editable and semantic features. Then, the latent code is optimized to fool the detector while preserving the high visual quality of the attacked image. Finally, we use a conditional diffusion model to reconstruct the adversarial fake face from the optimized latent code. We tested three spatial and two frequency detectors on the StyleGAN2 and StarGAN-V2 synthesis models. Experimental results demonstrate the proposed attack strategy achieves superior performance in attack effectiveness and visual quality compared to existing competitors.
Loading