Keywords: Adversarial Training, Synthesized Data, Robust Overfitting, Adversarial Robustness, Generalization
TL;DR: Adversarial Training improves robustness and generalization by stabilizing learning dynamics but faces a tradeoff due to robust overfitting. Synthesized data helps AT balance these benefits, offering a new approach to robust and generalizable models.
Abstract: Adversarial Training (AT) is a well-known framework designed to mitigate adversarial vulnerabilities in neural networks. Recent research indicates that incorporating adversarial examples (AEs) in training can enhance models' generalization capabilities. To understand the impact of AEs on learning dynamics, we study AT through the lens of sample difficulty methodologies. Our findings show that AT leads to more stable learning dynamics compared to Natural Training (NT), resulting in gradual performance improvements and less overconfident predictions. This suggests that AT steers training away from learning easy, perturbable spurious features toward more resilient and generalizable ones. However, a trade-off exists between adversarial robustness and generalization gains, due to robust overfitting, limiting practical deployment. To address this, we propose using synthesized data to bridge this gap. Our results demonstrate that AT benefits significantly from synthesized data, whereas NT does not, enhancing generalization without compromising robustness and offering new avenues for developing robust and generalizable models.
Submission Number: 104
Loading