Abstract: In recent years, deep learning has seen widespread application in many fields. However, the emergence of adversarial examples has revealed a critical vulnerability, making them susceptible to potential attacks. Although numerous adversarial attack methods have been proposed, they are often constrained by certain constraints. Most current adversarial attack methods are based on addition of perturbations, which are confined to clean examples and fail to generate diverse and natural adversarial examples. Moreover, owing to the additional perturbations, these methods exhibit limited robustness against adversarial defenses, such as image denoising. To tackle these challenges, this paper proposed a three-stage attack framework based on an adversarial diffusion model, named AT-Diff (Adversarial Transfer on Diffusion model). Firstly, to address the issue of inadequate robustness in traditional adversarial examples, we utilize a diffusion model to learn the adversarial example distribution of the target model and generate highly realistic adversarial examples from scratch without any additional perturbations, thereby significantly enhancing their performance under defense measures. Additionally, we design a Temporal Adaptive Loss (TAL) function, which dynamically adjusts the visual loss and the adversarial loss according to the current time step, ensuring that the generated adversarial examples gradually enhance their adversarial efficacy while maintaining high visual quality. Extensive experiments conducted on MNIST, Fashion-MNIST and CIFAR-10 datasets have shown that our method attains noteworthy results in imperceptibility, while also exhibiting stronger robustness.
Loading