Characterizing Robust Overfitting in Adversarial Training via Cross-Class Features

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Adversarial Training, Robust Overfitting, Cross-Class Features
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a novel understanding of robust overfitting from the perspective of feature interpretation
Abstract: Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust to adversarial attacks. However, AT can lead to a phenomenon known as robust overfitting where the test robust error gradually increases during training, resulting in a large robust generalization gap. In this paper, we present a novel interpretation of robust overfitting from the perspective of feature attribution. We find that at the best checkpoint of AT, the model tends to involve more cross-class features, which are shared by multiple classes, in its decision-making process. These features are useful for robust classification. However, as AT further squeezes the training robust loss, the model tends to make decisions based on more class-specific features, giving rise to robust overfitting. We also provide theoretical evidence for this understanding using a synthetic data model. In addition, our understanding can also justify why knowledge distillation is helpful for mitigating robust overfitting, and we further propose a weight-average guided knowledge distillation AT approach for improved robustness.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 843
Loading