\documentclass[accepted]{uai2022} 

%% Choose your variant of English; be consistent
\usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib}
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools}
\usepackage{amsfonts}
\usepackage{amsthm}
\usepackage{booktabs}
\usepackage{tikz}
\usepackage{multirow}
\usepackage[ruled]{algorithm2e}
\usepackage{xr}

% For cross referencing across files
\makeatletter
\newcommand*{\addFileDependency}[1]{% argument=file name and extension
  \typeout{(#1)}
  \@addtofilelist{#1}
  \IfFileExists{#1}{}{\typeout{No file #1.}}
}
\makeatother

\newcommand*{\myexternaldocument}[1]{%
    \externaldocument{#1}%
    \addFileDependency{#1.tex}%
    \addFileDependency{#1.aux}%
}

\myexternaldocument{perez_396-supp}

% Add-on Math stuff
\newtheorem*{remark}{Remark}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}

\title{Attribution of Predictive Uncertainties in Classification Models}

% Add authors
\author{Iker Perez}
\author{Piotr Skalski}
\author{Alec Barns-Graham}
\author{Jason Wong}
\author{David Sutton}
\affil{%
    Featurespace Research\\
    Cambridge\\
    United Kingdom
}

\begin{document}
\maketitle

\begin{abstract}
Predictive uncertainties in classification tasks are often a consequence of model inadequacy or insufficient training data. In popular applications, such as image processing, we are often required to scrutinise these uncertainties by meaningfully attributing them to input features. This helps to improve interpretability assessments. However, there exist few effective frameworks for this purpose. Vanilla forms of popular methods for the provision of saliency masks, such as SHAP or integrated gradients, adapt poorly to target measures of uncertainty. Thus, state-of-the-art tools instead proceed by creating \textit{counterfactual} or \textit{adversarial} feature vectors, and assign attributions by direct comparison to original images. In this paper, we present a novel framework that combines path integrals, counterfactual explanations and generative models, in order to procure attributions that contain few observable artefacts or noise. We evidence that this outperforms existing alternatives through quantitative evaluations with popular benchmarking methods and data sets of varying complexity. 

\end{abstract}

\section{Introduction}\label{sec:intro}

Model uncertainties often manifest aspects of a system or data generating process that are not exactly understood \citep{hullermeier2021aleatoric}, such as the influence of model inadequacy or a lack of diverse and representative data used during training. The ability to quantify and attribute such uncertainties to their sources can help scrutinize aspects in the functioning of a predictive model, and facilitate interpretability or fairness assessments in important machine learning applications \citep{awasthi2021evaluating}. The process is especially relevant in Bayesian inferential settings, which find applications in domains such as natural language processing \citep{xiao2019quantifying}, network analysis \citep{perez2021variational} or image processing \citep{kendall2017uncertainties}, to name only a few. 

Thus, there exists a growing interest in methods for uncertainty estimation \citep[e.g.][]{depeweg2018decomposition, DBLP:conf/uai/SmithG18, van2020uncertainty, tuna2021exploiting} for purposes such as procuring adversarial examples, active learning or \textit{out-of-distribution} detection. Recent work has proposed mechanisms for the attribution of predictive uncertainties to input features, such as pixels in an image \citep{van2019interpretable, antoran2021getting, schut2021generating}, with the goal of complementing interpretability tools disproportionately centred on explaining model scores, and to improve transparency in deployments of predictive models. These methods proceed by identifying \textit{counterfactual} (in-distribution) or \textit{adversarial} (out-of-distribution) explanations, i.e. small variations in the value of input features which output new model scores with minimal uncertainty. This has helped understand the strengths and weaknesses of various models. However, the relative contribution of individual pixels to poor model performance is up to human guesswork, or assigned by plain comparisons between an image and its altered representation. We report that uncertainty attributions derived following these approaches perform poorly, when measured by popular quantitative evaluations of image saliency maps. 

In this paper, our goal is to similarly map uncertainties in classification tasks to their origin in images, and to measure the relative contribution of each individual pixel. We show that popular attribution methods based on \textit{segmentation} \citep{ribeiro2016should}, \textit{resampling} \citep{kernelshap} or \textit{path integrals} \citep{sundararajan2017axiomatic} are easily re-purposed for this purpose. However, we evidence that naive applications of these approaches perform poorly. Thus, we present a new framework through a novel combination of path integrals, counterfactual explanations and generative models. Our approach is to attribute uncertainties by traversing a domain of integration defined in latent space, which connects a counterfactual explanation with its original image. The integration is projected into the observable pixel space through a generative model, and starts at a reference point which bears no predictive uncertainty. Hence, \textit{completeness} is satisfied and uncertainties are fully explained and decomposed over pixels in an image.

We note that relying on generative models has recently gained traction for interpretability and score attribution purposes \citep{lang2021explaining}. Through our method, we show how to leverage these models in order to procure clustered saliency maps, which reduce the observable noise in vanilla approaches. Applied to uncertainty attribution tasks, the proposed approach outperforms vanilla adaptations of popular interpretability tools such as LIME \citep{ribeiro2016should}, SHAP \citep{kernelshap} or integrated gradients \citep{sundararajan2017axiomatic}, as well as \textit{blur} and \textit{guided} variants \citep{xu2020attribution, kapishnikov2021guided}. We further combine these methods with \textit{Xrai} \citep{kapishnikov2019xrai}, a popular segmentation and attribution approach. The assessment\footnote{Source code for reproducing these results can be found at \href{http://github.com/Featurespace/uncertainty-attribution}{github.com/Featurespace/uncertainty-attribution}.} is carried out through both quantitative and qualitative evaluations, using popular benchmarking methods and data sets of varying complexity. 

\section{Uncertainty Attributions}\label{sec:attributions}

Consider a classification task with a classifier $f:\mathbb{R}^n \times \mathcal{W} \rightarrow \Delta^{|\mathcal{C}|-1}$ of a fixed architecture. The weights $\boldsymbol{w}\in\mathcal{W}$ are presumed to be fitted to some available train data set $\mathcal{D}=\{\boldsymbol{x}_i, c_i\}_{i=1, 2,\dots}$. Thus, the function $f(\boldsymbol{x}) \equiv f(\boldsymbol{x}, \boldsymbol{w})$ maps feature vectors $\boldsymbol{x}\in\mathbb{R}^n$ to an element in the standard $(|\mathcal{C}|-1)$-simplex, which represents membership probabilities across classes in a set $\mathcal{C}$. In the following, we are concerned with the \textit{entropy} as a measure of predictive uncertainty, i.e.
\begin{equation}
H(\boldsymbol{x})=-\sum_{c\in\mathcal{C}} f_c(\boldsymbol{x}) \cdot \log f_c(\boldsymbol{x}) \label{entropy}
\end{equation}
where $f_c(\boldsymbol{x})$ represents the predicted probability of class-$c$ membership. In Bayesian settings, we often consider a posterior distribution $\pi(\boldsymbol{w}|\mathcal{D})$ over weights in the model, and the term \eqref{entropy} may further be decomposed into \textit{aleatoric} and \textit{epistemic} components \citep{kendall2017uncertainties}. These represent different types of uncertainties, including inadequate data and inappropriate modelling choices. For simplicity in the presentation, we defer those details to Section \ref{app:Bayesian} in the supplementary material.

Popular \textit{resampling} or \textit{gradient}-based methods can easily be adapted in order to attribute measures of uncertainty such as $H(\boldsymbol{x})$ to input features in an image. This includes tools such as LIME \citep{ribeiro2016should}, SHAP \citep{kernelshap} or \textit{integrated gradients} (IG) \citep{sundararajan2017axiomatic}. In Figure \ref{dog_un_expl}, we show an example application of integrated gradients to \textit{dogs versus cats} data (further examples are found in Section~\ref{app:Examples} in the supplementary material). In the figure, the regions in red are identified as contributors to predictive uncertainties. We readily comprehend why the model struggles to predict any single class, by observing that a leash and a human hand are problematic. To the best of our knowledge, no research has yet explored the possibility of using these attribution methods to identify sources of uncertainty. Nevertheless, quantitative evaluations presented in Section \ref{sec:experiments} show that this approach offers generally poor performance. 

\begin{figure}[t!]
\centering
\includegraphics[width=0.48\textwidth]{images/main_dogs-vs-cats}
\caption{Example uncertainty attributions using integrated gradients. Classification task in \textit{dogs versus cats} data. In red, positive attributions which \textit{increase} entropy; in purple, negative attributions that \textit{decrease} entropy.} \label{dog_un_expl}
\end{figure}

\subsection{Path Integrals}

For later reference, we illustrate the above uncertainty attribution procedure with integrated gradients. In primitive form, a path method explains a scalar output $F(\boldsymbol{x})$ using a \textit{fiducial} image $\boldsymbol{x}^0$ as reference, which is presumably not associated with any class observed in training data. The importance attributed to \textit{pixel} $i$ for the purposes of explaining the quantity $F(\boldsymbol{x})$ is given by
$$\text{attr}^\delta_i(\boldsymbol{x}) = \int_0^1 \frac{\partial F(\delta(\alpha))}{\partial \delta_i(\alpha)}
\frac{\partial \delta_i(\alpha)}{\partial \alpha} d\alpha$$
where $\delta:[0, 1] \rightarrow \mathbb{R}^n$ represents a curve with endpoints at $\delta(0) = \boldsymbol{x}^0$ and $\delta(1) = \boldsymbol{x}$. Here, $\sum_i \text{attr}_i(\boldsymbol{x}) = F(\boldsymbol{x}) - F(\boldsymbol{x}^0)$ follows from the \textit{gradient theorem} for line integrals, s.t. the difference in output values decomposes over the sum of attributions. Commonly, $F(\boldsymbol{x})=f_c(\boldsymbol{x})$ represents the classification score for a class $c\in\mathcal{C}$ s.t. attributions capture elements in an image that are associated with this class. In order to attribute uncertainties, we readily assign $F(\boldsymbol{x})=H(\boldsymbol{x})$, and thus combine scores across all classes with aims to identify pixels that \textit{confuse} the model.

\textbf{Integrated Gradients}. Here, $\delta$ is parametrised as a straight path between a fiducial and the observed image, i.e. $\delta(\alpha) = \boldsymbol{x}^0 + \alpha (\boldsymbol{x}-\boldsymbol{x}^0)$, and the above simplifies to
$$\text{IG}_i(\boldsymbol{x}) = (x_i - x^0_i) \times \int_0^1 \frac{\partial H(\boldsymbol{x}^0 + \alpha (\boldsymbol{x}-\boldsymbol{x}^0))}{\partial x_i}d\alpha,$$
which corresponds to entropy attributions in Figure \ref{dog_un_expl} (
see Section~\ref{app:Bayesian} in the supplementary material for its decomposition into aleatoric and epistemic attributions).

Integrated gradients offers an efficient approach to produce attributions with differentiable models, as an alternative to layer-wise relevance propagation \citep{montavon2019layer} or DeepLift \citep{shrikumar2017learning}, and there exist several adaptations and extensions \citep{DBLP:journals/corr/SmilkovTKVW17, xu2020attribution, kapishnikov2021guided}. However, attributions are heavily influenced by differences in pixel values between $\boldsymbol{x}$ and $\boldsymbol{x}^0$, and the fiducial choice defaults to a black (or white) background. This fails to attribute importances to black (or white) pixels and is considered problematic \citep{sundararajan2017axiomatic}, leading to proposed \textit{blurred} or \textit{black+white} alternatives \citep{lundberg2017unified, kapishnikov2019xrai}. Additionally, $\delta$ transitions the path $\boldsymbol{x}^0 \rightsquigarrow \boldsymbol{x}$ \textit{out-of-distribution} \citep{jha2020enhanced, NEURIPS2020_075b051e}, i.e. through intermediary images not representative of training data, leading to noise and artefacts in attributions. 

\section{Methodology}\label{sec:methods}

We describe the proposed method for uncertainty attributions summarised in Algorithm \ref{Algo}. This combines path integrals with a generative process to define a domain of integration. We use a counterfactual fiducial bearing no relation to causal inference \citep{pmlr-v6-pearl10a}, i.e. an alternative \textit{in distribution} image $\boldsymbol{x}^0$ similar to $\boldsymbol{x}$ according to a suitable metric, s.t. $f(\boldsymbol{x}^0)$ bears close to $0$ predictive uncertainty. 

We choose to leverage a \textit{variational auto-encoder} (VAE) as the generative model. As customary, this is composed of a unit-Gaussian data-generating process of arbitrary dimensionality $m<<n$, along with an image decoder $\psi:\mathbb{R}^m \rightarrow \mathbb{R}^n$. Here, $\boldsymbol{z}|\boldsymbol{x}\sim\mathcal{N}(\phi_\mu(\boldsymbol{x}), \phi_\sigma(\boldsymbol{x}))$ represents the approximate posterior in latent space, with mean and variance encoding functions $\phi_\mu,\phi_\sigma:\mathbb{R}^n \rightarrow \mathbb{R}^m$.

\begin{algorithm*}[ht]
\SetKwInOut{Input}{input}\SetKwInOut{Output}{output}
\Input{Feature vector $\boldsymbol{x}$, predictive distribution $f(\cdot)$ and distance metric $d(\cdot, \cdot)$. \\ VAE encoder $\phi(\cdot)$ and decoder $\psi(\cdot)$, penalty $\lambda >> 0$ and learning rate $\nu>0$.}
\Output{Attributions $\text{attr}^{\delta_\psi}_i(\boldsymbol{x})$, $i=1,\dots, n$.}
Initialise $\boldsymbol{z}^0 = \boldsymbol{z} = \phi_\mu(\boldsymbol{x})$\;
Compute predicted class $\hat{c} = \argmax_i f_i(\boldsymbol{x})$\;
\While{\textit{$\mathcal{L}_1$ not converged}}{
$$\mathcal{L}_1 \gets d(\psi(\boldsymbol{z}^0), \boldsymbol{x}) + \frac{1}{2m} \sum_j z_j^2 - \lambda \log f_{\hat{c}}(\psi(\boldsymbol{z})) \quad \text{and} \quad \boldsymbol{z}^0 \gets \boldsymbol{z}^0 - \nu\nabla_{\boldsymbol{z}} \mathcal{L}_1$$}
\While{\textit{$\mathcal{L}_2$ not converged}}{
$$\mathcal{L}_2 \gets d(\psi(\boldsymbol{z}), \boldsymbol{x}) + \frac{1}{2m} \sum_j z_j^2 \quad \text{and} \quad \boldsymbol{z} \gets \boldsymbol{z} - \nu\nabla_{\boldsymbol{z}} \mathcal{L}_2$$}
Approximate $\text{attr}^{\delta_\psi}_i(\boldsymbol{x})$, $i=1,\dots,n$ in \eqref{main_formula} along $\delta_{\psi, \boldsymbol{z}^0 \rightarrow \boldsymbol{z}}$ through trapezoidal integration.
\caption{Generative Uncertainty Attributions} \label{Algo}
\end{algorithm*}

\subsection{Domain of Integration} \label{domain_par}

The domain of integration is defined as a curve across end-points $\boldsymbol{x}^0 \rightsquigarrow \boldsymbol{x}$. We select the fiducial as a decoded image $\boldsymbol{x}^0 = \psi(\boldsymbol{z}^0)$, where $\boldsymbol{z}^0$ is the solution to the constrained optimization problem
\begin{align}
    \argmin_{\boldsymbol{z}\in\mathbb{R}^m} & \quad \Big[ d(\psi(\boldsymbol{z}), \boldsymbol{x}) + \frac{1}{2m} \sum_j z_j^2 \Big] \label{find_fiducial} \\
    \text{subject to} & \quad \lVert e_{\hat{c}} - f(\psi(\boldsymbol{z})) \rVert < \varepsilon \nonumber &
\end{align}
for an infinitesimal $\epsilon>0$. Here, $\hat{c}=\argmax_i f_i(\boldsymbol{x})$ is the predicted class by the classifier, and $e_i$ is the unit indicator vector at index $i$. The metric $d(\cdot, \cdot)$ may be chosen to be the cross-entropy or mean absolute difference over pixel values in an image. The right-most term is the negative log-density (up to proportionality) of $\boldsymbol{z}$ in a latent space of dimensionality $m>0$; this restricts the search \textit{in-distribution} and ensures robustness to overparametrisation of the latent space within our experiments. 

Hence, we retrieve a counterfactual fiducial which (i) is classified in the same class as $\boldsymbol{x}$ and (ii) bears close to zero predictive uncertainty. In practice, we approximate \eqref{find_fiducial} through the penalty method, i.e. an unconstrained search with a large penalty on 
$$d_{\mathcal{X}}(e_{\hat{c}}, f(\psi(\boldsymbol{z}))) = - \log f_{\hat{c}}(\psi(\boldsymbol{z})), $$
i.e. the cross-entropy between the predicted class $\hat{c}$ and the membership vector $f(\psi(\boldsymbol{z}))$ given a decoding $\psi(\boldsymbol{z})$. We proceed by gradient descent initialised at $\phi_\mu(\boldsymbol{x})$, the encoder's mean. 

\begin{figure}[!ht]
\centering
\includegraphics[width=0.48\textwidth]{images/diagram_path}
\caption{Procedural sketch to generate a path of integration. Here, \textit{fiducial} $\boldsymbol{z}^0$ and \textit{reconstruction} $\boldsymbol{z}$ points are optimized in latent space by gradient descent, starting initially from the encoding of $\boldsymbol{x}$ (dashed lines). A connecting straight path (in blue) is projected to the data-manifold and augmented with an interpolating component (in red).} \label{diagram_path}
\end{figure}

\textbf{Integration Path}. We further leverage the decoder as a generative process to parametrise a curve $\delta_\psi:[0, 1] \rightarrow \mathbb{R}^n$, by following the steps displayed in Figure \ref{diagram_path}, s.t. $\delta_\psi(\alpha) = \psi(\boldsymbol{z}^0 + \alpha (\boldsymbol{z} - \boldsymbol{z}^0))$ where
$$\boldsymbol{z} = \argmin_{\boldsymbol{z}\in\mathbb{R}^m} \Big[ d(\psi(\boldsymbol{z}), \boldsymbol{x}) +
\frac{1}{2m} \sum_j z_j^2 \Big]$$ 
is also optimised by gradient descent initialised at $\phi_\mu(\boldsymbol{x})$. This is an unconstrained optimisation problem analogue to \eqref{find_fiducial}. Consequently, the path $\delta_\psi$ offers trajectory between a counterfactual $\delta_\psi(0) = \psi(\boldsymbol{z}^0) = \boldsymbol{x}^0$ and a reconstruction $\delta_\psi(1) = \psi(\boldsymbol{z})$ of the image $\boldsymbol{x}$. In order to correct for mild reconstruction errors, we finally augment the domain of integration through a \textit{vanilla} straight path between the end-points $\psi(\boldsymbol{z}) \rightsquigarrow \boldsymbol{x}$. We display a few examples of this procedure on MNIST digits within Figure~\ref{Mnist_paths}. Overall, the difference in predictive entropy or model scores between a reconstruction $\psi(\boldsymbol{z})$ and its original counterpart $\boldsymbol{x}$ are not observed to be significant within our experiments.

\begin{figure}[!ht]
\includegraphics[width=0.48\textwidth]{images/path_expl}
\caption{An example of \textit{in-distribution} curves connecting fiducial (left-most) and real (right-most) data points, on MNIST digits data. Digits on the left bear no predictive uncertainty in classification.} \label{Mnist_paths}
\end{figure}

\begin{figure*}[!ht]
\centering
\includegraphics[width=\textwidth]{images/feature_face}
\caption{Comparison of uncertainty attributions on a \textit{CelebA} image. We compare attributions for three classifiers, which measure the presence (or lack) of \textit{smiles} (left), \textit{arched eyebrows} (centre), and \textit{bags under eyes} (right). Red pixels contribute by increasing uncertainties, in green we find contributions towards decreasing uncertainties.} \label{feature_face}
\end{figure*}

\subsection{Line Integral for Attributions}

For simplicity, we restrict the formulae to the \textit{in-distribution} component along the curve $\delta_\psi:[0, 1] \rightarrow \mathbb{R}^n$ defined in Subsection \ref{domain_par}, and we ignore the straight path connecting $\psi(\boldsymbol{z}) \rightsquigarrow \boldsymbol{x}$. We require the total differential of the entropy $H(\cdot)$ wrt $\boldsymbol{z}$ in latent space; however, we wish to retrieve importances for features $\boldsymbol{x}$ in the original data manifold within $\mathbb{R}^n$. To this end, the attribution at index $i=1,\dots,n$ is given by
\begin{align}
\text{attr}^{\delta_\psi}_i(\boldsymbol{x}) = \sum_{j=1}^m (z_j - z^0_j) \int_0^1 
\frac{\partial H(\delta_\psi(\alpha))}{\partial\delta_{\psi,i}(\alpha)}
\frac{\partial \delta_{\psi,i}(\alpha)}{\partial z_j}  d\alpha.
\label{main_formula}
\end{align}
Intuitively, we compute the total derivative of $H(\cdot)$ wrt $\alpha$ in the integration path, using the chain rule. We decompose the calculation over indices in pixel space, and further undertake summation over contributions in latent space. In Figure \ref{feature_face}, we show an example that compares attributions in \eqref{main_formula} versus vanilla integrated gradients. There, we find a \textit{CelebA} image \citep{liu2015faceattributes} with tags for the presence of a \textit{smile}, \textit{arched eyebrows} and \textit{no bags under the eyes}. 

\subsection{Properties}

Due to \textit{path independence} and noting that $H(\boldsymbol{x}^0) \approx 0$ by definition, importances drawn from a trajectory $\delta_\psi(\cdot)$ as parametrised in Subsection \ref{domain_par} will approximately account for \textbf{all} of the uncertainty in a posterior predictive task, i.e. 
$$H(\boldsymbol{x}) \approx \int_0^1\nabla H(\delta_\psi(\alpha)) d\alpha = \sum_{i=1}^n\text{attr}^{\delta_\psi}_i(\boldsymbol{x}),$$
and this is commonly referred to as \textit{completeness}. Additionally, the reliance on path derivatives along with the rules of composite functions ensure that both fundamental axioms of \textit{sensitivity(b)} (i.e. \textit{dummy property}) along with \textit{implementation invariance} are inherited, and we refer the reader to \citet{friedman2004paths, sundararajan2017axiomatic} for the technical details. Importantly, the attribution will be zero for any index which does not influence the classifier. 

\subsubsection{The Role of the Autoencoder}

A VAE is arguably not the best generative model for reconstructing sharp images with high fidelity. However, it is stable during training and efficient in sampling, furthermore, the encoder provides a mean to efficiently select starting values $\phi_\mu(\boldsymbol{x})$ during latent optimisation tasks \citep[cf.][]{antoran2021getting}. In Section \ref{app:robustness} within the supplementary material we offer a robustness assessment of our results to variations in the autoencoder, and we report on negligible changes in performance. We achieve consistency even in large overparametrised latent spaces, due to Gaussian priors in the optimisation procedures in Subsection \ref{domain_par}, which define the integration path. 

Alternative models can be used to define integration paths. \textit{Generative adversarial networks} have gained relevance as a means to facilitate interpretability in classification tasks \citep{lang2021explaining}, however, training can be unstable and identifying counterfactual references is infeasible. This also presents a problem with \textit{autoregressive models} \citep{van2016conditional}, which are further inefficient in sampling and would pose long optimisation times in latent space.

\subsubsection{Non-Generative Integration Paths} \label{sec:straight_path}

For simplicity, a counterfactual fiducial image $\boldsymbol{x}^0 = \psi(\boldsymbol{z}^0)$ as described in \eqref{find_fiducial} can also be combined with a straight or \textit{guided} \citep{kapishnikov2021guided} integration path $\psi(\boldsymbol{z}^0) \rightsquigarrow \boldsymbol{x}$. In application to simple grey-scale images, this path is unlikely to transverse \textit{out-of-distribution} due to the proximity between a fiducial and the original image $\boldsymbol{x}$. In our experiments, we test these variants and report that they fare relatively well in explainability tasks with simple images; however, their performance degrades on complex RGB pictures involving facial features. 

\section{Experiments}\label{sec:experiments}

Uncertainty attributions are commonly facilitated through generative and adversarial models, and can thus be computationally expensive to produce. Consequently, they have traditionally only been evaluated on simple data sets \citep[cf.][]{antoran2021getting, schut2021generating}. Here, we similarly apply our proposed methodology to classification models in the image repositories \textit{MNIST handwritten digits} \citep{MNISTdb} and \textit{fashion-MNIST} \citep{xiao2017fashion}. However, we also extend evaluation tasks to high resolution facial images in \textit{CelebA} \citep{liu2015faceattributes}.

We evaluate the performance both quantitatively and qualitatively, and we compare the results to path methods including \textit{vanilla} integrated gradients \citep{sundararajan2017axiomatic}, as well as \textit{blur} and \textit{guided} variants \citep{xu2020attribution, kapishnikov2021guided}. We test these approaches with \textit{plain}, \textit{black+white} (B+W) and \textit{counterfactual} fiducials, and we combine the saliency maps with \text{Xrai} \citep{kapishnikov2019xrai}, a popular segmentation and attribution approach. We also evaluate pure counterfactual approaches for uncertainty attributions, which assign importances by directly comparing pixel values between an image and its counterfactual. For this, we include most recent \textit{CLUE} attributions \citep{antoran2021getting} in the assessment. For completeness, we finally add adaptations of \textit{LIME} \citep{ribeiro2016should} and \textit{kernelSHAP} \citep{kernelshap}. Implementation details are found in the supplementary Section~\ref{app:Impl}. Source code for reproducing results can be found at \href{http://github.com/Featurespace/uncertainty-attribution}{github.com/Featurespace/uncertainty-attribution}.

\subsection{Performance Metrics}

In order to produce quantitative evaluations we resort to \textit{smallest sufficient region} methods popularised in recent literature \citep[see][]{petsiuk2018rise, kapishnikov2019xrai, covert2020feature, lundberg2020local}, which evaluate the quality of saliency maps in the absence of ground truths. These are suitable for our repeated assessments over multiple methods and data sets, as they do not require for specialised model retrains \citep[cf.][]{hooker2019benchmark, jethani2021have}. The methods proceed by revealing pixels from a masked image, in order of importance as determined by attribution values, and changes in classification scores, predictive entropy or image information content are monitored. Alternatively, the process may be carried backwards by removing or resampling pixels from the original image, and we show an example of this process in Figure \ref{fig:show_metric}. We use blurring as a masking mechanism \citep[cf.][]{kapishnikov2019xrai}, since other alternatives lead to masked images significantly out of distribution, i.e. non representative of training data. We evaluate two inclusion and removal metrics suitable to measure changes in predictive uncertainty.

\begin{figure}[t!]
\centering
\includegraphics[width=0.485\textwidth]{images/blur_out.png}
\caption{Normalised variation in predictive entropy (decreasing, blue) and image information content (increasing, orange) as pixels most contributing to uncertainty are sequentially blurred. Classification task on digits (left), bags under eyes (centre) and smiles (right). Information content approximated by compressed file sizes.} \label{fig:show_metric}
\end{figure}

\textbf{Inclusion Methods.} We measure the \textit{entropy information curve} (EIC) in a manner analogue to \textit{performance information curves} (PICs) discussed in \cite{kapishnikov2019xrai, kapishnikov2021guided}. For an image $\boldsymbol{x}$ with $n$ pixels, we define a sequence $\{\boldsymbol{x}^i\}_{i=0,\dots,n}$ that transitions from a blurred reference $\boldsymbol{x}^0 = \boldsymbol{x}_{\textit{blurred}}$ towards $\boldsymbol{x}^n = \boldsymbol{x}$, by revealing pixels in order of contribution to decreasing the entropy. We evaluate
\begin{equation*}
\textit{EIC}_i =\frac{1}{|\mathcal{X}|}\sum_{\boldsymbol{x}\in\mathcal{X}} \frac{H(\boldsymbol{x}^i)}{H(\boldsymbol{x}_{\textit{blurred}})} 
\end{equation*}
across indexes in the transition $\boldsymbol{x}_{\textit{blurred}}\xrightarrow{i=1,\dots,n}\boldsymbol{x}$, which retrieves an average over images in each data set $\mathcal{X}$ (in the presence of significant outliers, we report on median values). The EIC measures the variation in overall predictive entropy and can be computed on unlabelled data. It is assessed versus the information content in the images as pixels are revealed \cite{kapishnikov2019xrai, kapishnikov2021guided}, which can be approximated by file sizes or the second order Shannon entropy. 

\textbf{Best Removal Methods.} We measure \textit{uncertainty reduction curves}, i.e. the relative uncertainty that an attribution method can remove from an image $\boldsymbol{x}$. We use the inverse sequence $\{\boldsymbol{x}^{i}\}_{i=0,\dots,n}$, which transitions from $\boldsymbol{x}^0 = \boldsymbol{x}$ towards a \textit{blurred} image $\boldsymbol{x}^n = \boldsymbol{x}_{\textit{blurred}}$. We evaluate
\begin{equation*}
\textit{URC}_i = \frac{1}{|\mathcal{X}|}\sum_{\boldsymbol{x}\in\mathcal{X}} max_{r \leq i} \bigg[ 1 - \frac{H(\boldsymbol{x}^r)}{H(\boldsymbol{x})} \bigg],
\end{equation*}
i.e. the best percentage reduction in predictive uncertainty that can be explained away by \textit{blurring} up to $i$ pixels, in decreasing order of contribution to uncertainty.

\subsection{Quantitative Evaluation}

\begin{table*}[ht]
    \centering
    \caption{Area over the entropy information curve and percentile points in uncertainty reduction curves, across attribution methods and classification tasks. Metrics procured wrt approximated and normalised information content of images.} \label{tab:EIC_URC}
    \setlength{\tabcolsep}{4pt}
    \renewcommand{\arraystretch}{1.3}
    \resizebox{0.98\textwidth}{!}{
        \begin{tabular}{l|ccccc|ccllllllll|}
\multicolumn{1}{c|}{\multirow{3}{*}{Method}} & \multicolumn{5}{c|}{Area over Entropy Information Curve}                                                                                                                                                                                & \multicolumn{10}{c|}{Uncertainty Reduction Curve}                                                                                                                                                                                                   \\ \cline{2-16} 
\multicolumn{1}{c|}{}                        & \multicolumn{1}{l}{\multirow{2}{*}{Mnist}} & \multicolumn{1}{l}{\multirow{2}{*}{Fashion}} & \multicolumn{1}{l}{\multirow{2}{*}{Smiles}} & \multicolumn{1}{l}{\multirow{2}{*}{Eyebrows}} & \multicolumn{1}{l|}{\multirow{2}{*}{Eyebags}} & \multicolumn{2}{c}{Mnist}       & \multicolumn{2}{c}{Fashion}                       & \multicolumn{2}{c}{Smiles}                         & \multicolumn{2}{c}{Eyebrows}                       & \multicolumn{2}{c|}{Eyebags}                        \\
\multicolumn{1}{c|}{}                        & \multicolumn{1}{l}{}                       & \multicolumn{1}{l}{}                         & \multicolumn{1}{l}{}                        & \multicolumn{1}{l}{}                          & \multicolumn{1}{l|}{}                         & 1\%            & 5\%            & \multicolumn{1}{c}{1\%} & \multicolumn{1}{c}{5\%} & \multicolumn{1}{c}{5\%} & \multicolumn{1}{c}{10\%} & \multicolumn{1}{c}{5\%} & \multicolumn{1}{c}{10\%} & \multicolumn{1}{c}{5\%} & \multicolumn{1}{c|}{10\%} \\ \hline
Vanilla IG                                   & 0.998                                      & 0.759                                        & 0.354                                       & 0.155                                         & 0.143                                         & 0.469          & 0.508          & 0.109                   & 0.196                   & 0.076                   & 0.085                    & 0.097                   & 0.104                    & 0.117                   & 0.131                     \\
+ (B+W)                                      & \textbf{0.999}                             & 0.901                                        & 0.584                                       & 0.422                                         & 0.361                                         & 0.379          & 0.631          & 0.083                   & 0.217                   & 0.149                   & 0.185                    & 0.209                   & 0.233                    & 0.146                   & 0.195                     \\
+ Counterfactual                             & \textbf{0.999}                             & 0.909                                        & 0.600                                       & 0.396                                         & 0.325                                         & \textbf{0.751} & \textbf{0.872} & \textbf{0.217}          & \textbf{0.431}          & 0.176                   & 0.215                    & 0.213                   & 0.244                    & 0.153                   & 0.179                     \\ \hline
Blur IG                                      & 0.973                                      & 0.818                                        & 0.368                                       & 0.144                                         & 0.136                                         & 0.017          & 0.102          & 0.016                   & 0.076                   & 0.015                   & 0.019                    & 0.014                   & 0.017                    & 0.008                   & 0.009                     \\
Guided IG                                    & 0.996                                      & 0.655                                        & 0.333                                       & 0.134                                         & 0.119                                         & 0.222          & 0.291          & 0.009                   & 0.035                   & 0.014                   & 0.017                    & 0.016                   & 0.023                    & 0.009                   & 0.012                     \\
+ (B+W)                                      & 0.997                                      & 0.735                                        & 0.318                                       & 0.151                                         & 0.130                                         & 0.115          & 0.283          & 0.006                   & 0.036                   & 0.017                   & 0.018                    & 0.035                   & 0.046                    & 0.008                   & 0.013                     \\
+ Counterfactual                             & \textbf{0.999}                             & 0.879                                        & 0.360                                       & 0.277                                         & 0.206                                         & 0.715          & 0.833          & 0.168                   & 0.326                   & 0.056                   & 0.063                    & 0.137                   & 0.152                    & 0.062                   & 0.081                     \\
Generative IG                                & \textbf{0.999}                             & \textbf{0.920}                               & \textbf{0.737}                              & \textbf{0.429}                                & \textbf{0.433}                                & 0.747          & 0.866          & 0.201                   & 0.386                   & \textbf{0.318}          & \textbf{0.389}           & \textbf{0.243}          & \textbf{0.278}           & \textbf{0.173}          & \textbf{0.233}            \\ \hline
LIME                                         & 0.993                                      & 0.630                                        & 0.231                                       & 0.088                                         & 0.140                                         & 0.000          & 0.021          & 0.001                   & 0.011                   & 0.011                   & 0.015                    & 0.009                   & 0.016                    & 0.009                   & 0.019                     \\ \hline
SHAP                                         & 0.994                                      & 0.900                                        &                                             &                                               &                                               & 0.119          & 0.319          & 0.080                   & 0.222                   & \multicolumn{1}{c}{}    & \multicolumn{1}{c}{}     & \multicolumn{1}{c}{}    & \multicolumn{1}{c}{}     & \multicolumn{1}{c}{}    & \multicolumn{1}{c|}{}     \\
+ Counterfactual                             & 0.985                                      & 0.839                                        &                                             &                                               &                                               & 0.515          & 0.683          & 0.165                   & 0.302                   & \multicolumn{1}{c}{}    & \multicolumn{1}{c}{}     & \multicolumn{1}{c}{}    & \multicolumn{1}{c}{}     & \multicolumn{1}{c}{}    & \multicolumn{1}{c|}{}     \\ \hline
CLUE                                         & 0.969                                      & 0.659                                        & 0.349                                       & 0.177                                         & 0.135                                         & 0.264          & 0.289          & 0.042                   & 0.076                   & 0.028                   & 0.031                    & 0.043                   & 0.050                    & 0.007                   & 0.010                     \\ \hline
XRAI + IG                                    & 0.991                                      & 0.750                                        & 0.541                                       & 0.230                                         & 0.156                                         & 0.023          & 0.093          & 0.010                   & 0.037                   & 0.053                   & 0.101                    & 0.036                   & 0.056                    & 0.018                   & 0.028                     \\
+ (B+W)                                      & 0.992                                      & 0.811                                        & 0.637                                       & 0.312                                         & 0.236                                         & 0.002          & 0.035          & 0.009                   & 0.044                   & 0.121                   & 0.206                    & 0.067                   & 0.103                    & 0.028                   & 0.057                     \\
+ Counterfactual                             & 0.952                                      & 0.648                                        & 0.267                                       & 0.235                                         & 0.243                                         & 0.248          & 0.425          & 0.057                   & 0.148                   & 0.098                   & 0.144                    & 0.134                   & 0.227                    & 0.102                   & 0.183                     \\
XRAI + GIG                                   & 0.990                                      & 0.671                                        & 0.173                                       & 0.098                                         & 0.054                                         & 0.012          & 0.054          & 0.003                   & 0.016                   & 0.019                   & 0.030                    & 0.006                   & 0.012                    & 0.003                   & 0.005                     \\
+ (B+W)                                      & 0.988                                      & 0.699                                        & 0.118                                       & 0.120                                         & 0.043                                         & 0.001          & 0.018          & 0.002                   & 0.012                   & 0.021                   & 0.032                    & 0.016                   & 0.027                    & 0.002                   & 0.004                     \\
+ Counterfactual                             & 0.960                                      & 0.622                                        & 0.094                                       & 0.222                                         & 0.115                                         & 0.202          & 0.391          & 0.028                   & 0.087                   & 0.012                   & 0.013                    & 0.082                   & 0.107                    & 0.010                   & 0.019                     \\
XRAI + Gen IG                                & 0.971                                      & 0.710                                        & 0.512                                       & 0.240                                         & 0.275                                         & 0.245          & 0.415          & 0.047                   & 0.129                   & 0.179                   & 0.243                    & 0.141                   & 0.224                    & 0.113                   & 0.190                    
\end{tabular}}
    \end{table*}

In Table \ref{tab:EIC_URC} we report on (i) the area over the entropy information curve and (ii) percentile points in the uncertainty reduction curve, for the various attribution methods and data sets analysed in this paper. We explore 5 classification tasks, including the presence of \textit{smiles}, \textit{arched eyebrows} and \textit{eye-bags} in CelebA images. In all cases, high values represent better estimated performance. The metrics are evaluated on images that were excluded during model training. Attribution methods have been implemented with default parameters, where available, and we offer details in the supplementary Section~\ref{app:Impl}. Blurring is performed with a Gaussian kernel, and the standard deviation is tuned individually for each classification task. We choose the minimum standard deviation s.t. a model's predictive uncertainty for the fully blurred images is maximised. \textit{KernelSHAP} evaluations are offered only for data sets with small resolution images, due to the computational complexity associated with undertaking the recommended amount of image perturbations.

The results show that a generative method as presented in this paper is better suited to explain variations in predictive entropy, as well as explaining away sources of uncertainty. Results suggest that improvements over the explored alternatives are of significance in classification tasks with high resolution images concerning facial features. In application to low resolution grey scale images, the results also show that popular attribution approaches, such as integrated gradients, guided integrated gradients or SHAP require a counterfactual fiducial to perform well, which must still be produced through a generative model. In these cases, good performance is a consequence of low dissimilarity between an image and its baseline (see Subsection \ref{sec:straight_path}), s.t. simple integration paths remain in-distribution.

In all cases, segmentation-based interpretability methods such as Xrai or \textit{LIME} offer comparatively worse performance. This is due to the complexity associated with segmentation tasks in the data sets selected for this evaluation.

\begin{figure}[b!]
\centering
\includegraphics[width=0.48\textwidth]{images/compare_blurring_urc.png}
\caption{Uncertainty reduction curves for best performing attribution methods on \textit{bags under the eyes}, CelebA data. Left, blurring is set to the minimum feasible value. Right, we assign an arbitrarily large standard deviation.} \label{fig:urc_plot}
\end{figure}

\textbf{Blurring setting}. Evaluations are notoriously dependent on the standard deviation setting of the Gaussian kernel. High standard deviation settings lead to blurred images that are significantly out of distribution. This degrades the projected performance across all attribution methods, as observed in the URC curves displayed in Figure \ref{fig:urc_plot}, corresponding to the classification model for \textit{bags under the eyes} on CelebA data. Thus, results in Table \ref{tab:EIC_URC} represent \textit{best} measured performances. Also, we note that attributions produced in combination with Xrai \citep{kapishnikov2019xrai} remain consistent across evaluations, a benefit from pre-processing and pixel segmentation leading to highly clustered importances.

\textbf{Autoencoder Settings.} The performance of our proposed method plateaus after a certain dimensionality is reached in the latent space representation. Further increasing the complexity of the autoencoder, or changing its training scheme, leads to consistent results. This is a consequence of regularisation terms imposed over optimisation tasks in \eqref{find_fiducial}. We note that fiducial points and integration paths are forced to lie in distribution, even within large and overparametrised encoding spaces. A robustness assessment with performance metrics can be found in Section~\ref{app:robustness} within the supplementary material.

\subsection{Qualitative Evaluation} \label{sec:qualitative}

\begin{figure*}[t]
\centering
\includegraphics[width=0.99\textwidth]{images/qualitative.png}
\caption{Sample uncertainty attribution masks for selected attribution methods. Masks correspond to digits (top), smiles (mid) and bags under the eyes (bottom).} \label{fig:qualitative}
\end{figure*}

In Figure \ref{fig:qualitative} we find sample uncertainty attribution masks associated with best performing methods, and we offer further examples in Section~ \ref{app:Examples} in the supplementary material. In the figure, attribution masks for vanilla IG and guided IG  are presented with counterfactual fiducial baselines, in order to avoid noisy saliency masks, such as previously observed in Figure \ref{feature_face}. Counterfactual baselines allow to isolate small subsets of pixels that are associated with predictive uncertainty, and producing them requires an autoencoder. In combination with an integration path further defined by a generative model, the attribution method we have presented produces clustered attributions which are de-correlated from raw pixel-value differences between an image and its counterfactual, unlike \textit{Clue} importances. This offers increasingly sparse and easily interpretable uncertainty attributions, which is reportedly associated with better performance in quantitative evaluations \citep[cf.][]{kapishnikov2021guided}. Finally, segmentation based mechanisms do not perform well in the data sets that we have explored, since they do not contain varied objects and items that can be easily segregated. 

\section{Discussion}\label{sec:discussion}

In this paper, we have introduced a novel framework for the attribution of predictive uncertainties in classification models, which combines path methods, counterfactual explanations and generative models. This is thus an additional tool contributing to improved transparency and interpretability in deep learning applications.

We have further offered comprehensive benchmarks on the multiple approaches for explaining predictive uncertainties, as well as vanilla adaptations of popular score attribution methods. For this purpose, we have leveraged standard feature removal and addition techniques. Our experimental results show that a combination of counterfactual fiducials along with straight or guided path integrals is sufficient to attain best performance in simple classification tasks with greyscale images. However, complex images benefit from subtle definitions of integration paths that can only be defined through a generative process as described in this paper. 

The method presented in this paper is applicable to classification models for data sets where we may feasibly synthesise realistic images through a generative model. This currently includes a variety of application domains, such as human faces, postures, pets, handwriting, clothes, or landscapes \citep{creswell2018generative}. Yet, the scope and ability of such models to synthesise new types of figures is quickly increasing. Also, we evidenced that we do not require a particularly accurate generative process within our method, i.e. the uncertainty attribution procedure we have presented yields top performing results even in the presence of errors and dissimilarities during image reconstructions.

\begin{contributions} 
I.~Perez and P.~Skalski contributed equally to this paper.
\end{contributions}

\begin{acknowledgements} 
We thank K.~Wong and M.~Barsacchi for the support and discussions that helped shape this manuscript. We thank Featurespace for the resources provided during the completion of this research.
\end{acknowledgements}

\bibliography{perez_396}

\end{document}
