Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability

Renpu Liu; Hao Wu; Jiawei Zhang; Xin Cheng; Xiangyang Luo; Bin Ma; Jinwei Wang

Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability

Renpu Liu, Hao Wu, Jiawei Zhang, Xin Cheng, Xiangyang Luo, Bin Ma, Jinwei Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: We shift the perturbation from pixel space to feature space and perturb important features multiple times along the direction of feature importance within the feature space.

Abstract: Adversarial examples have been shown to deceive Deep Neural Networks (DNNs), raising widespread concerns about this security threat. More seriously, as different DNN models share critical features, feature-level attacks can generate transferable adversarial examples, thereby deceiving black-box models in real-world scenarios. Nevertheless, we have theoretically discovered the principle behind the limited transferability of existing feature-level attacks: Their attack effectiveness is essentially equivalent to perturbing features in one step along the direction of feature importance in the feature space, despite performing multiple perturbations in the pixel space. This finding indicates that existing feature-level attacks are inefficient in disrupting features through multiple pixel-space perturbations. To address this problem, we propose a P2FA that efficiently perturbs features multiple times. Specifically, we directly shift the perturbed space from pixel to feature space. Then, we perturb the features multiple times rather than just once in the feature space with the guidance of feature importance to enhance the efficiency of disrupting critical shared features. Finally, we invert the perturbed features to the pixels to generate more transferable adversarial examples. Numerous experimental results strongly demonstrate the superior transferability of P2FA over State-Of-The-Art (SOTA) attacks.

Lay Summary: Deep neural networks can be tricked by "adversarial examples"—slightly altered inputs that cause errors. However, existing methods for creating these examples are inefficient, as they disrupt key features by modifying images at the pixel level. We propose a new method, P2FA, which directly targets and disrupts these key features multiple times within the feature space, significantly improving attack effectiveness. P2FA generates adversarial examples that more effectively deceive various AI models, surpassing the best existing methods. Our research exposes weaknesses in AI systems, helping developers create safer, more reliable models to protect AI-dependent technologies from real-world threats.

Primary Area: Deep Learning->Robustness

Keywords: Transferability, Adversarial Example, AI Security

Submission Number: 9386

Loading