Abstract: Adversarial Training (AT ) has been demonstrated to improve the robustness of deep neural networks (DNNs) to adversarial attacks. AT is a min-max optimization procedure wherein adversarial examples are generated to train a robust DNN. Generally, AT utilize un-targeted
adversarial examples because the inner maximization step maximizes the losses of inputs w.r.t their actual classes, ensuring misclassifications into any other incorrect classes. The outer minimization involves minimizing the losses on the adversarial examples obtained from the inner maximization. This work proposes a standard-deviation-inspired (SDI ) regularization term for improving adversarial robustness and generalization. We argue that the inner maximization is akin to minimizing a modified standard deviation of a model’s output probabilities. Moreover, we argue that maximizing the modified standard deviation measure may complement the outer minimization of the AT framework. To corroborate our argument, we experimentally show that the SDI measure may be utilized to craft adversarial examples. Furthermore, we show that combining the proposed SDI regularization term with existing AT variants improves the robustness of DNNs to stronger attacks (e.g., CW and Auto-attack) and improves robust generalization.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Takashi_Ishida1
Submission Number: 3200
Loading