Revisiting Image Classifier Training for Improved Certified Robust Defense against Adversarial Patches

Aniruddha Saha; Shuhua Yu; Mohammad Sadegh Norouzzadeh; Wan-Yi Lin; Chaithanya Kumar Mummadi

Revisiting Image Classifier Training for Improved Certified Robust Defense against Adversarial Patches

Aniruddha Saha, Shuhua Yu, Mohammad Sadegh Norouzzadeh, Wan-Yi Lin, Chaithanya Kumar Mummadi

Published: 02 Oct 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Certifiably robust defenses against adversarial patches for image classifiers ensure correct prediction against any changes to a constrained neighborhood of pixels. PatchCleanser, the state-of-the-art certified defense, uses a double-masking strategy for robust classification. The success of this strategy relies heavily on the model's invariance to image pixel masking. In this paper, we take a closer look at model training schemes to improve this invariance. Instead of using Random Cutout augmentations like PatchCleanser, we introduce the notion of worst-case masking, i.e., selecting masked images which maximize classification loss. However, finding worst-case masks requires an exhaustive search, which might be prohibitively expensive to do on-the-fly during training. To solve this problem, we propose a two-round greedy masking strategy (Greedy Cutout) which finds an approximate worst-case mask location with much less compute. We show that the models trained with our Greedy Cutout improves certified robust accuracy over Random Cutout in PatchCleanser across a range of datasets and architectures. Certified robust accuracy on ImageNet with a ViT-B16-224 model increases from 58.1% to 62.3% against a 3% square patch applied anywhere on the image.

Submission Length: Regular submission (no more than 12 pages of main content)

Supplementary Material: zip

Assigned Action Editor: ~Pin-Yu_Chen1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1381

Loading