Keywords: adversarial training, adversarial examples, generalization bound
Abstract: The interactions between the update of model parameters and the update of perturbation operators complicate the dynamics of adversarial training (AT). This paper reveals a surprising behavior in AT, namely that the distribution induced by adversarial perturbations during AT becomes progressively more difficult to learn. We derived a generalization bound to theoretically attribute this behavior to the increasing of a quantity associated with the perturbation operator, namely, its local dispersion. We corroborate this explanation with concrete experimental validations and show that this deteriorating behavior of the induced distributions is correlated with robust overfitting of AT.
Latex Source Code: zip
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission576/Authors, auai.org/UAI/2025/Conference/Submission576/Reproducibility_Reviewers
Submission Number: 576
Loading