Pixel Reweighted Adversarial Training

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Adversarial Training
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Adversarial training (AT) is a well-known defensive framework that trains a model with generated adversarial examples (AEs). AEs are crafted by intentionally adding perturbations to the natural images, aiming to mislead the model into making erroneous outputs. In existing AT methods, the magnitude of perturbations is usually constrained by a predefined perturbation budget, denoted as $\epsilon$, and keeps the same on each dimension of the image (i.e., each pixel within an image). However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose a new framework called Pixel-reweighted AdveRsarial Training (PART), to partially lower $\epsilon$ for pixels that rarely influence the model's outputs, which guides the model to focus more on regions where pixels are important for model's outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these reweighted AEs to train a model. PART achieves a notable improvement in the robustness-accuracy trade-off on CIFAR-10, SVHN and Tiny-ImageNet and serves as a general framework, seamlessly integrating with a variety of AT, CAM and AE generation methods. More importantly, our work revisits the conventional AT framework and justifies the necessity to allocate distinct weights to different pixel regions during AT.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1033
Loading