Certifiable Robustness Against Patch Attacks Using an ERM Oracle

Kevin Stangl; Avrim Blum; Omar Montasser; Saba Ahmadi

Certifiable Robustness Against Patch Attacks Using an ERM Oracle

Kevin Stangl, Avrim Blum, Omar Montasser, Saba Ahmadi

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone

Abstract: Consider patch attacks, where at test-time an adversary manipulates a test image with a patch in order to induce a targeted mis-classification. We consider a recent defense to patch attacks, Patch-Cleanser (Xiang et al., 2022). The Patch-Cleanser algorithm requires a prediction model to have a “two-mask correctness” property, meaning that the prediction model should correctly classify any image when any two blank masks replace portions of the image. To this end, Xiang et al. (2022) learn a prediction model to be robust to two-mask operations by augmenting the training set by adding pairs of masks at random locations of training images, and performing empirical risk minimization (ERM) on the augmented dataset. However, in the non-realizable setting when no predictor is perfectly correct on all two-mask operations on all images, we exhibit an example where ERM fails. To overcome this challenge, we propose a different algorithm that provably learns a predictor robust to all two-mask operations using an ERM oracle, based on prior work by Feige et al. (2015a) .

1 Reply

Loading