DiffAdvMAP: Flexible Diffusion-Based Framework for Generating Natural Unrestricted Adversarial Examples
TL;DR: A flexible diffusion-based framework for generating natural and effective unrestricted adversarial examples.
Abstract: Unrestricted adversarial examples(UAEs) have posed greater threats to deep neural networks(DNNs) than perturbation-based adversarial examples(AEs) because they can make extensive changes to images without being restricted in a fixed norm perturbation budget. Although current diffusion-based methods can generate more natural UAEs than other unrestricted attack methods, the overall effectiveness of such methods is restricted since they are designed for specific attack conditions. Additionally, the naturalness of UAEs still has room for improvement, as these methods primarily focus on leveraging diffusion models as strong priors to enhance the generation process. This paper proposes a flexible framework named Diffusion-based Adversarial Maximum a Posterior(DiffAdvMAP) to generate more natural UAEs for various scenarios. DiffAdvMAP approaches the generation of UAEs by sampling images from posterior distributions, which is achieved by approximating the posterior distribution of UAEs using the prior distribution of real data learned by the diffusion model. This process enhances the naturalness of the UAEs. By incorporating an adversarial constraint to ensure the effectiveness of the attack, DiffAdvMAP exhibits excellent attack ability and defense robustness. A reconstruction constraint is designed to enhance its flexibility, which allows DiffAdvMAP to be tailored to various attack scenarios. Experimental results on Imagenet show that we achieve a better trade-off between image quality, flexibility, and transferability than baseline unrestricted adversarial attack methods.
Lay Summary: Traditional "AI-tricking images" (like adding tiny invisible tweaks to real images) can fool AI systems, but they become less dangerous because they only make small, limited changes. Unrestricted AI-tricking images (UAEs) are way riskier—they can completely alter the content of a picture (e.g., changing shapes, colors, or even the whole structure) to trick AI more effectively. While current methods (which based on AI model that can generate realistic-looking images) can create more natural UAEs, they still have two big flaws: Limited use--They are only designed for specific situations; not realistic enough--The fake images still look slightly off because they use the AI’s "existing knowledge" simply. To fix these problems, we built a new tool called DiffAdvMAP. Here’s how it works: Looks real--The tool fully considers the relationship between existing knowledge and goals, which creates fake images that look closer to real images; Works reliably--These images consistently fool AI systems. Plus, the tool can adjust its attack strategy for different needs. Tests show that DiffAdvMAP’s fake images are harder to spot and more successful in tricking AI across various scenarios. Whether it's image quality, adapting to different situations, or attacking other AI models, it beats existing methods. In short, it makes "fake images" both sneaky and powerful.
Primary Area: Deep Learning->Robustness
Keywords: Unrestricted Adversarial Attacks, Diffusion Models, Flexible Unrestricted Adversarial Attacks
Submission Number: 6652
Loading