Perturbation Deterioration: The Other Side of Catastrophic OverfittingDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: Our goal is to understand why the robustness accuracy would abruptly drop to zero, after conducting FGSM-style adversarial training for too long. While this phenomenon is commonly explained as overfitting, we observe that it is a twin process: not only does the model catastrophic overfits to one type of perturbation, but also the perturbation deteriorates into random noise. For example, at the same epoch when the FGSM-trained model catastrophically overfits, its generated perturbations deteriorate into random noise. Intuitively, once the generated perturbations become weak and inadequate, models would be misguided to overfit those weak attacks and fail to defend strong ones. In the light of our analyses, we propose APART, an adaptive adversarial training method, which parameterizes perturbation generation and progressively strengthens them. In our experiments, APART successfully prevents perturbation deterioration and catastrophic overfitting. Also, APART significantly improves the model robustness while maintaining the same efficiency as FGSM-style methods, e.g., on the CIFAR-10 dataset, APART achieves 53.89%accuracy under the PGD-20 attack and 49.05% accuracy under the AutoAttack.
One-sentence Summary: We reveal that the other side of catastrophic overfitting is perturbation underfitting, and propose a novel method to prevent catastrophic overfitting by alleviating perturbation deterioration.
5 Replies

Loading