Denoised Smoothing with Sample Rejection for Robustifying Pretrained ClassifiersDownload PDF

Published: 21 Nov 2022, Last Modified: 05 May 2023TSRML2022Readers: Everyone
Keywords: Adversarial robustness, randomized smoothing, certifiable defense, sample rejection.
TL;DR: To make pretrained classifiers robust against adversarial attacks, we propose to augment the denoised smoothing system with a “rejector", and prove its certifiability and empirical advantage over denoised smoothing alone.
Abstract: Denoised smoothing is the state-of-the-art approach to defending pretrained classifiers against $\ell_p$ adversarial attacks, where a denoiser is prepended to the pretrained classifier, and the joint system is adversarially verified via randomized smoothing. Despite its state-of-the-art certified robustness against $\ell_2$-norm adversarial inputs, the pretrained base classifier is often quite uncertain when making its predictions on the denoised examples, which leads to lower natural accuracy. In this work, we show that by augmenting the joint system with a ``rejector'' and exploiting adaptive sample rejection, (i.e., intentionally abstain from providing a prediction), we can achieve substantially improved accuracy (especially natural accuracy) over denoised smoothing alone. That is, we show how the joint classifier-rejector can be viewed as a classification-with-rejection per sample, while the smoothed joint system can be turned into a robust \emph{smoothed classifier without rejection}, against $\ell_2$-norm perturbations while retaining certifiability. Tests on CIFAR10 dataset show considerable improvements in \emph{natural} accuracy without degrading adversarial performance, with affordably-trainable rejectors, specially for medium and large values of noise parameter $\sigma$.
4 Replies