Robustness between the worst and average case

Leslie Rice; Anna Bair; Huan Zhang; J Zico Kolter

Robustness between the worst and average case

Leslie Rice, Anna Bair, Huan Zhang, J Zico Kolter

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: adversarial robustness, data augmentation, Hamiltonian Monte Carlo, path sampling

TL;DR: We define a new metric for evaluating classifier robustness by introducing an interpolation between robustness over random and adversarial perturbations, and propose an MCMC-based sampler to effectively evaluate this metric.

Abstract: Several recent works in machine learning have focused on evaluating the test-time robustness of a classifier: how well the classifier performs not just on the target domain it was trained upon, but upon perturbed examples. In these settings, the focus has largely been on two extremes of robustness: the robustness to perturbations drawn _at random_ from within some distribution (i.e., robustness to random perturbations), and the robustness to the _worst case_ perturbation in some set (i.e., adversarial robustness). In this paper, we argue that a sliding scale between these two extremes provides a valuable additional metric by which to gauge robustness. Specifically, we illustrate that each of these two extremes is naturally characterized by a (functional) q-norm over perturbation space, with q=1 corresponding to robustness to random perturbations and q=\infty corresponding to adversarial perturbations. We then present the main technical contribution of our paper: a method for efficiently estimating the value of these norms by interpreting them as the partition function of a particular distribution, then using path sampling with MCMC methods to estimate this partition function (either traditional Metropolis-Hastings for non-differentiable perturbations, or Hamiltonian Monte Carlo for differentiable perturbations). We show that our approach provides substantially better estimates than simple random sampling of the actual “intermediate-q” robustness of both standard, data-augmented, and adversarially-trained classifiers, illustrating a clear tradeoff between classifiers that optimize different metrics. Code for reproducing experiments can be found at https://github.com/locuslab/intermediate_robustness.

Supplementary Material: zip

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

10 Replies

Loading