Adversarial Training May Induce Deteriorating Distributions

Published: 01 Jan 2025, Last Modified: 08 Sept 2025UAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The interactions between the update of model parameters and the update of perturbation operators complicate the dynamics of adversarial training (AT). This paper reveals a surprising behavior in AT, namely that the distribution induced by adversarial perturbations during AT becomes progressively more difficult to learn. We derived a generalization bound to theoretically attribute this behavior to the increasing of a quantity associated with the perturbation operator, namely, its local dispersion. We corroborate this explanation with concrete experimental validations and show that this deteriorating behavior of the induced distributions is correlated with robust overfitting of AT.
Loading