Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

Binghui Li; Yuanzhi Li

Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

Binghui Li, Yuanzhi Li

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: deep learning theory, adversarial robustness, adversarial training, clean generalization and robust overfitting

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We provide a theoretical understanding why clean generalization and robust overfitting both happen in adversarial training.

Abstract: Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising $\textit{clean generalization}$ ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for $\textit{unseen clean data}$. However, in constrast with clean generalization, while adversarial training method is able to achieve low robust training error, there still exists a significant $\textit{robust generalization gap}$, which promotes us exploring what mechanism leads to both $\textit{clean generalization and robust overfitting (CGRO)}$ during learning process. In this paper, we provide a theoretical understanding of this puzzling phenomenon (CGRO) through $\textit{feature learning theory}$. Specifically, we prove that, under our theoretical framework (patch-structured dataset and one-hidden-layer CNN model) , a $\textit{three-stage phase transition}$ happens from adversarial training dynamics, and the network learner provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thereby results in CGRO phenomenon. Besides, for more general data assumption, we then show the efficiency of CGRO classifier from the perspective of $\textit{representation complexity}$. On the empirical side, we also verify our theoretical analysis about learning process in real-world vision dataset.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7273

Loading