Defense Against Adversarial Attacks Using Topology Aligning Adversarial Training

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Inf. Forensics Secur. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent works have indicated that deep neural networks (DNNs) are vulnerable to adversarial attacks, wherein an attacker perturbs an input example with human-imperceptible noise that can easily fool the DNNs, resulting in incorrect predictions. This severely limits the application of deep learning in security-critical scenarios, such as face authentication. Adversarial training (AT) is one of the most practical approaches to strengthening the robustness of DNNs. However, existing AT-based methods treat each training sample independently, thereby ignoring the underlying topological structure in the training data. To this end, in this paper, we take full advantage of the topology information and introduce a Topology Aligning Adversarial Training (TAAT) algorithm. TAAT aims to encourage the trained model to maintain consistency in the topological structure within the feature space of both natural and adversarial examples. To ensure the stability and efficiency of topology alignment, we further introduce a novel Knowledge-Guided (KG) training scheme. This scheme explicitly aligns local logit outputs and global topological structures from the target model with a robust auxiliary model. To verify the effectiveness of the proposed method, we conduct extensive experiments on popular benchmark datasets (e.g., CIFAR and ImageNet) and evaluate the robustness against state-of-the-art adversarial attacks (e.g., PGD-attack and AutoAttack). The experimental results demonstrate that the proposed method has superior robustness over the previous state-of-the-art methods. Our code and pre-trained models are available at https://github.com/SkyKuang/TAAT .
Loading