Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness

Xianglu Wang; Hu Ding

Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness

Xianglu Wang, Hu Ding

Published: 22 Jan 2025, Last Modified: 09 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial training, Continual learning, Catastrophic forgetting

Abstract: In recent years, there has been an explosion of research into developing robust deep neural networks against adversarial examples. As one of the most successful methods, Adversarial Training (AT) has been widely studied before, but there is still a gap to achieve promising clean and robust accuracy for many practical tasks. In this paper, we consider the AT problem from a new perspective which connects it to catastrophic forgetting in continual learning (CL). Catastrophic forgetting is a phenomenon in which neural networks forget old knowledge upon learning a new task. Although AT and CL are two different problems, we show that they actually share several key properties in their training processes. Specifically, we conduct an empirical study and find that this forgetting phenomenon indeed occurs in adversarial robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and TinyImageNet) and perturbation models ($\ell_{\infty}$ and $\ell_{2}$). Based on this observation, we propose a novel method called Adaptive Multi-teachers Self-distillation (AMS), which leverages a carefully designed adaptive regularizer to mitigate the forgetting by aligning model outputs between new and old ``stages''. Moreover, our approach can be used as a unified method to enhance multiple different AT algorithms. Our experiments demonstrate that our method can significantly enhance robust accuracy and meanwhile preserve high clean accuracy, under several popular adversarial attacks (e.g., PGD, CW, and Auto Attacks). As another benefit of our method, we discover that it can largely alleviate the robust overfitting issue of AT in our experiments.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4315

Loading