Balancing Generalization and Robustness in Adversarial Training via Steering through Clean and Adversarial Gradient Directions
Abstract: Adversarial training (AT) is a fundamental method to enhance the robustness of Deep Neural Networks (DNNs) against adversarial examples. While AT achieves improved robustness on adversarial examples, it often leads to reduced accuracy on clean examples. Considerable effort has been devoted to handling the trade-off from the perspective of input space. However, we demonstrate that the trade-off can also be illustrated from the perspective of the gradient space. In this paper, we propose Adversarial Training with Adaptive Gradient Reconstruction (AGR), a novel approach that balances generalization (accuracy on clean examples) and robustness (accuracy on adversarial examples) in adversarial training via steering through clean and adversarial gradient directions. We first introduce an ingenious technique named Gradient Orthogonal Projection in the case of negative correlation gradients to adjust the adversarial gradient direction to reduce the degradation of generalization. Then we present a gradient interpolation scheme in the case of positive correlation gradients for efficiently increasing the generalization without compromising the robustness of the final obtained. Rigorous theoretical analysis proves that our AGR has lower generalization error upper bounds indicating its effectiveness. Comprehensive experiments empirically demonstrate that AGR achieves excellent capability of balancing generalization and robustness, and is compatible with various adversarial training methods to achieve superior performance.
Loading