Understanding Input Transformation-Based Attacks via Target Function Space Expansion

Yang Hu; Tao Yang; Qingyun Sun; Xiuli Bi; Bin Xiao; Jianxin Li

Understanding Input Transformation-Based Attacks via Target Function Space Expansion

Yang Hu, Tao Yang, Qingyun Sun, Xiuli Bi, Bin Xiao, Jianxin Li

18 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanism Interpretability, Adversarial Attack, Adversarial Transferability

TL;DR: Input transformation-based attack can improve model transformation invariance via in-domain data, thereby enhancing adversarial transferability.

Abstract: Research on transfer-based adversarial attacks provides critical insights into distinctions among Deep Neural Networks (DNNs), revealing their vulnerabilities when exposed to unseen noise. Among these transfer-based adversarial attacks, input transformation-based attacks are popular due to their simplicity and effectiveness. However, their mechanisms remain poorly understood, potentially hindering advancements in DNNs. This work explores the mechanism of the attacks, suggesting that 1) when trained with input transformations, models can improve transformation invariance by capturing diverse features from transformed inputs rather than transformation-invariant features. Therefore, given a surrogate model $f_s$ trained with input transformations $\varphi$, adversarial attacks can leverage these transformations to expand the target function space $f_s \circ \varphi$, thereby effectively and rapidly improving adversarial transferability, as domain shifts are mitigated; 2) input transformation-based attacks enhance adversarial transferability by expanding the target function space. Such transformations effectively act as modifications to the target model, thereby improving attack robustness against diverse models; and 3) L2-normalization should be incorporated into the attack paradigm to mitigate gradient imbalance during adversarial example generation. This imbalance arises from domain shift variability induced by different transformations. Based on the findings, we design a simple transformation-based attack called SimAttack. It achieves a mean attack success rate of 95.4\% on 12 models, and some of the generated examples are also effective against GPT 4.1.

Primary Area: interpretability and explainable AI

Submission Number: 10117

Loading