Keywords: robust overfitting, adversarial training, generalization
Abstract: Robust overfitting has been observed to arise in adversarial training. We hypothesize that this phenomenon may be related to the evolution of the data distribution along the training trajectory. To investigate this, we select a set of checkpoints in adversarial training and perform standard training on distributions induced by adversarial perturbation w.r.t the checkpoints. We observe that the obtained models become increasingly harder to generalize when robust overfitting occurs, thereby validating the hypothesis. We show the hardness of generalization on the induced distributions is related to certain local property of the perturbation operator at each checkpoint. The connection between the local property and the generalization on the induced distribution is proved by establishing an upper bound of the generalization error. Other interesting phenomena related to the adversarial training trajectory are also observed.
Submission Number: 68
Loading