Adversarial Training May Induce Deteriorating Distributions

Runzhi Tian; Yongyi Mao

Adversarial Training May Induce Deteriorating Distributions

Runzhi Tian, Yongyi Mao

Published: 07 May 2025, Last Modified: 28 Jul 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial training, adversarial examples, generalization bound

Abstract: The interactions between the update of model parameters and the update of perturbation operators complicate the dynamics of adversarial training (AT). This paper reveals a surprising behavior in AT, namely that the distribution induced by adversarial perturbations during AT becomes progressively more difficult to learn. We derived a generalization bound to theoretically attribute this behavior to the increasing of a quantity associated with the perturbation operator, namely, its local dispersion. We corroborate this explanation with concrete experimental validations and show that this deteriorating behavior of the induced distributions is correlated with robust overfitting of AT.

Latex Source Code: zip

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission576/Authors, auai.org/UAI/2025/Conference/Submission576/Reproducibility_Reviewers

Submission Number: 576

Loading