\section{Introduction}
\label{sec:intro}

Explanation methods for Deep Learning (DL) models need access to intermediate layers which often is not easy to access as in the case of Class Activation Map (CAM) based methods (e.g., Grad-CAM \citep{selvaraju2017grad}, Grad-CAM++ \citep{chattopadhay2018grad}). Addressing such a problem, model agnostic based explanations methods have been proposed which do not need access to the layers of DL models and work in complete black-box setting. Methods like LIME \citep{ribeiro2016should}, SHAP \citep{lundberg2017unified} and their variants have shown the application in black-box setting making them a popular choice in post-hoc explanations.

Despite the popularity and model agnostic property of LIME \citep{ribeiro2016should}, a number of inconsistency in LIME has been reported \citep{zhang2019should, gosiewska2019not, li2023g, lee2023towards, zhao2021baylime, zafar2019dlime, zhou2021s}. \cite{gosiewska2019not} and \cite{lee2023towards} highlight the instability of additive explanations and observed variations in feature importance across different methods. Additionally, \cite{zhang2019should} note (i) variability in explanations due to sampling, (ii) dependence on hyper-parameters such as neighborhood size and sample count, and (iii) fluctuations in model reliability across different instances. These factors lead to inconsistency in explanations making them unreliable. 


\begin{figure}[htp]
\centering
\includegraphics[width=0.49\textwidth]{figures/lime_newfoundland_181.jpg}
\caption{Figure showing the top five positive and negative superpixels (segments) of inconsistent LIME explanations for a random image of the Oxford-IIIT Pets dataset with Inception V3 model for four different runs. The predicted class was Newfoundland, and the prediction probability was 0.46. Blue and red colors denote positive and negative superpixels, and the numbers inside the superpixels specify their importance rank. By addressing the limitations, we demonstrate consistent explanation using our proposed approach for the same image and model. (Results in \Cref{fig:slice_belief_consistency} - supplementary material).}
\label{fig:lime_inconsistency}
\end{figure}

As shown in \Cref{fig:lime_inconsistency}, the inconsistency of LIME explanation can be noted in highlighted superpixels that flip between positive (blue) and negative (red) contributions for the output probability. Further, it can be noted that the importance ranks of the superpixels (segments) for both positively and negatively contributing superpixels also vary across different runs. These inconsistencies make interpretability challenging \citep{Bora_2024_CVPR}. This flipping of superpixel sign, for different independent runs, is defined as the uncertainty in the sign of superpixels (i.e., Sign Entropy \citep{Bora_2024_CVPR}). Estimating the uncertainty of the signs of the superpixels by using bootstrapping on frequentist Ridge Regression, and eliminating superpixels with high uncertainty in signs (i.e., selecting features with low sign entropy) has been shown to stabilize LIME explanations \citep{Bora_2024_CVPR}. This however, comes at the cost of considerable increase in execution time due to bootstrapping approach. In this paper, we propose a novel Sign Entropy Regularization using Bayesian paradigm to estimate the uncertainty and mitigate the inconsistencies while achieving significantly faster ($\approx10\times$) execution time. 


\begin{comment}
It can be defined as:

\[
H(\beta_{j}) = -p^{+} \log_2(p^{+}) - p^{-} \log_2(p^{-}),
\]
\noindent where $H(\beta_{j})$ is the Sign Entropy of the $j^{th}$ superpixel, \( p^{+} \) is the estimated probability that \( \beta_{j} \) is positive and \( p^{-} = 1 - p^{+} \) is the estimated probability that \( \beta_{j} \) is negative. A high value of sign entropy indicates that the coefficient’s sign has a high probability of flipping.

 %It therefore becomes difficult to determine the direction of impact of a superpixel on the output probability if such flipping of sign occur across different runs. Further, the variation in importance ranks also complicates understanding the model's (Inception V3 in this case) behavior. 

%We use the term Sign Entropy to denote this uncertainty in the sign (positive or negative contribution) of the superpixels and quantify the variation in superpixel importance ranks (similar to \cite{Bora_2024_CVPR}) using Average Rank Similarity (ARS).    
\end{comment}
