FeConDefense: Reversing adversarial attacks via feature consistency loss

Published: 01 Jan 2023, Last Modified: 24 Jul 2025Comput. Commun. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing adversarial defense methods often employ adversarial training or data pre-processing techniques to defend against adversarial attacks. However, adversarial training is burdensome as it requires to find a single representation that works for all attack possibilities, and may decrease the network model’s classification accuracy after excessive training. While data pre-processing methods focus on eliminating adversarial perturbations by modifying the input samples, they do not consider the internal relationships between reverse perturbations and adversarial examples, resulting in weak specificity of the generated modifications. In this paper, we propose a novel adversarial defense method, named FeConDefense, which aims to reverse adversarial attacks via analyzing the intrinsic features of images. Specifically, we first extract two different features of adversarial examples by respectively using two different network models. Then, we design a novel feature consistency loss to measure the distance between these two features. Finally, we integrate the feature consistency into the contrastive learning to generate reverse perturbation for each adversarial example. Comprehensive experiments on different adversarial attack methods demonstrate that our FeConDefense achieves state-of-the-art results in reversing adversarial perturbations and improving robustness of image classifiers.
Loading