Abstract: Randomized smoothing provides neural network models with verifiable robustness against adversarial attacks. Most randomized smoothing defenses certify a probabilistic guarantee within an $\ell_p$ norm ball around each data point. However, certifying sufficient robustness radii towards perturbations that are large in $\ell_p$ norm requires smoothing the model input using a large variance. Consequently, the resulting classifier usually exhibits poor certified accuracy. Moreover, it might be impossible to obtain large robustness radii for certain data points and models despite a large-variance smoothing noise being adopted. For instance, these robustness guarantees are often vulnerable towards unrestricted or semantic perturbations, which have large $\ell_p$ norm but still remain imperceptible to human eyes. In this paper, we propose Concert, a method for certifying the robustness of neural network-based image classifiers with context-aware smoothing noise under the guidance of pixel-wise entropy values measured by a colorization model. In lieu of sampling noise from univariate Gaussian distributions, we increase the Gaussian variance in high-entropy dimensions that are more vulnerable to adversarial manipulations, while keeping a moderate variance in low-entropy dimensions. Concert acquires larger robustness radii on input dimensions that are prone to adversarial perturbations while preserving certified accuracy since other input dimensions are not significantly randomized. We show that our method's certified accuracy and robustness radius on benign images are on par with that of the state-of-the-art smoothing techniques while outperforming them on semantically perturbed images.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Gautam_Kamath1
Submission Number: 942
Loading