Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

Olukorede Fakorede; Modeste Atsague; Jin Tian

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

Olukorede Fakorede, Modeste Atsague, Jin Tian

Published: 22 Dec 2024, Last Modified: 22 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Adversarial Training (AT ) has been demonstrated to improve the robustness of deep neural networks (DNNs) to adversarial attacks. AT is a min-max optimization procedure wherein adversarial examples are generated to train a robust DNN. The inner maximization step of AT maximizes the losses of inputs w.r.t their actual classes. The outer minimization involves minimizing the losses on the adversarial examples obtained from the inner maximization. This work proposes a standard-deviation-inspired (SDI ) regularization term for improving adversarial robustness and generalization. We argue that the inner maximization is akin to minimizing a modified standard deviation of a model’s output probabilities. Moreover, we argue that maximizing the modified standard deviation measure may complement the outer minimization of the AT framework. To corroborate our argument, we experimentally show that the SDI measure may be utilized to craft adversarial examples. Furthermore, we show that combining the proposed SDI regularization term with existing AT variants improves the robustness of DNNs to stronger attacks (e.g., CW and Auto-attack) and improves robust generalization.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Takashi_Ishida1

Submission Number: 3200

Loading