Frequency Attacks Based on Invertible Neural Networks

Ming-Wen Shao, Jian-Xin Yang, Lingzhuang Meng, Zhiyong Hu

Published: 01 Jan 2025, Last Modified: 25 Jul 2025IEEE Trans. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Adversarial attacks reveal the vulnerability of classifiers based on deep neural networks to well-designed perturbations. Most existing attack methods focus on adding perturbations directly to the pixel space. However, the perturbations generated by these methods may be easily perceived by humans. To alleviate the aforementioned problem, we propose a novel high-frequency attack based on invertible neural networks (HA-INN) that relies on INNs to adds perturbations to the high-frequency space of the image instead of the pixel space. In this way, we can fool the classifier while the perturbations are not easily detected by humans. Specifically, we introduce INNs to separate the high-frequency and low-frequency components of the image. And the low-frequency components are guaranteed to be reconstructed as the original image, while the high-frequency components are replaced with resampled high-frequency latent variables with additional adversarial information. Then, the low-frequency components and the high-frequency components with adversarial information are inversely fed into the INN to generate effective adversarial examples. Extensive experiments on two datasets (CIFAR-10 and CIFAR-100) show that our method can generate misleading and transferable cross-architectural adversarial examples with greatly reduced computational resource requirements. Under the white-box setting, the attack success rate of CIFAR10 and CIFAR100 is 99.8% and 99.74%, respectively. Further, under the black-box setting, the adversarial examples generated by our method are more effective than the other methods.