\section{Introduction}

A desirable attribute of machine learning models is robustness to perturbations of input data. A popular notion of robustness is adversarial robustness, the ability of a model to maintain its prediction when presented with adversarial perturbations, i.e., perturbations designed to cause the model to change its prediction. Although adversarial robustness identifies whether a misclassified example exists in a local region around an input, it fails to capture the degree of vulnerability of that example, indicated by the difficulty in finding an adversary. For example, if the model geometry is such that 99\% of the local region around an example (say, point $A$) contains correctly classified examples, this makes it harder to find an adversarial example as compared to the case where only 1\% of the local region (say, for point $B$) contains correctly classified examples, where even random perturbations may be misclassified. However, from the adversarial robustness perspective, the prediction at a point is declared either robust or not, and thus both points $A$ and $B$ are considered equally non-robust (see Figure \ref{fig:main-fig} for an illustrative example). The ease of obtaining a misclassification, or \textit{data point vulnerability}, is captured by another kind of robustness: \emph{average-case robustness}, i.e., the fraction of points in a local region around an input for which the model provides consistent predictions. \footnote{In addition to the size of the misclassified region, another factor that affects the ease of finding misclassified examples is the specific optimization method used. In this study, we aim to study model robustness in a manner agnostic to the specific optimization used, and thus, we only focus on the size of the misclassified region. We believe this study can form the basis for future studies looking into the properties (e.g., “ease of identifying misclassified examples”) of specific optimization methods.} If this fraction is less than one, then an adversarial perturbation exists. The smaller this fraction, the easier it is to find a misclassified example. %In this sense, when computed over the same neighborhoods, average-case robustness strictly generalizes adversarial robustness, providing a more comprehensive view of model behavior. 
While adversarial robustness is motivated by model security, average-case robustness is better suited for model and dataset understanding, and debugging.


Standard approaches to computing average-case robustness involve Monte-Carlo sampling, which is computationally inefficient especially for high-dimensional data. For example, \citet{cohen2019certified} use $n=100,000$ Monte Carlo samples per data point to compute this quantity. %This computational inefficiency is exacerbated when computing average-case robustness for many data points (e.g., for all points in a dataset). 
In this paper, we propose to compute average-case robustness via analytical estimators, reducing the computational burden, while simultaneously providing insight into model decision boundaries. Our estimators are exact for linear models and well-approximated for non-linear models, especially those having a small local curvature \cite{moosavi2019robustness, srinivas2022efficient}. Overall, our work makes the following contributions:

\begin{enumerate}
    \item We derive novel analytical estimators to efficiently compute the average-case robustness of multi-class classifiers. We also provide estimation error bounds for these estimators that characterizes approximation errors for non-linear models. 

    \item We empirically validate our analytical estimators on standard deep learning models and datasets, demonstrating that these estimators accurately and efficiently estimate average-case robustness.

    \item We demonstrate the usefulness of our estimators in two case studies: identifying vulnerable samples in a dataset and measuring class-level robustness bias \cite{nanda2021fairness}, where we find that standard models exhibit significant robustness bias among classes. 
\end{enumerate}


To our knowledge, this work is the first to investigate analytical estimation of average-case robustness for the multi-class setting. In addition, the efficiency of these estimators makes the computation of average-case robustness practical, especially for large deep neural networks.

\begin{figure}
    \centering
    \includegraphics[width=0.9\linewidth]{figures/robustness_estimation_3.pdf}
    \caption{Consider a binary classifier (green vs. yellow) and points $A$ (left) and $B$ (right), both correctly classified to the yellow class. The dotted red circles represent $\epsilon$-balls around the data points. Although adversarial robustness rightly considers the model non-robust at both points (due to the existence of adversarial examples within the $\epsilon$-ball), it fails to discern that point $B$ has a larger fraction of misclassified points in its neighborhood, making it more vulnerable than point $A$, an aspect exactly captured by average-case robustness.}
    \label{fig:main-fig}
\end{figure}



