\section{Proposed Approach}
LIME works by building a local simple surrogate model to approximate the decision boundary near the IE (details in \Cref{subsection: Overview-LIME}). A frequentist Ridge Regression is used as a surrogate model in the LIME and SLICE implementation, while in BayLIME, a Bayesian Ridge Regression is used. The coefficients of the surrogate model, which have mapping to each superpixel, represent the impact (i.e., sign and magnitude) of the corresponding superpixels on the output probability. Hence, the flipping of the sign of the surrogate model coefficients leads to uncertainty regarding the direction of impact of the superpixels on the output probability. This section discusses our approach to reducing the uncertainty of coefficients' signs in the surrogate model using our novel Sign Entropy regularization. Additionally, the relative ranks of the coefficients also stabilize as an added advantage of our regularization, further enhancing explainability. We first discuss it as a general-purpose regularization technique, and then in subsequent sections, we show its applicability on tabular and image datasets.


\begin{comment}

details in \Cref{sub:bayesian_formulation} and \Cref{sub:sign_entropy_regularization}.
We propose a Sign Entropy regularization using the Bayesian paradigm that enforces sparsity by eliminating coefficients with a high Sign Entropy during the training phase. A high degree of sign flips is not desirable for obtaining consistent explanations as it makes it unclear regarding the direction of impact of a feature on the target. \cite{Bora_2024_CVPR} proved that a reliable Sign Entropy estimation of the coefficients can be done using bootstrapping and the frequentist Ridge Regression. This estimated Sign Entropy was used for feature selection. Our use of Bayesian Framework enables the estimation of Sign Entropy, using the mean and standard deviation of the coefficients. We then impose a Sign Entropy prior on the coefficients during the optimization to enforce sparsity. The approach is discussed in 

The Sign Entropy is computed provided by the Bayesian framework to estimate the probability of coefficients' sign flips directly. This helps us to avoid the feature selection step compared to \cite{Bora_2024_CVPR}. Additionally, as a side-effect of stabilizing the sign of the coefficients, the relative ranks of the coefficients also gets stabilized.    
\end{comment}



\subsection{Bayesian Formulation}
\label{sub:bayesian_formulation}
Bayesian Ridge Regression model is defined as \( y = X\beta + \epsilon \) with \( \beta \) representing the vector of coefficients, and \( \epsilon \) representing Gaussian noise with precision parameter \( \alpha \). Bayesian Ridge Regression applies a prior \(p(\beta \mid \lambda)\) over \( \beta \) with Gaussian distribution $\mathcal{N}$ given by:

\[
p(\beta \mid \lambda) = \mathcal{N}(0, \lambda^{-1} I),
\]

\noindent where, \( \lambda \) controls the regularization strength i.e., the precision of the prior and $I$ is an identity matrix.

The likelihood function for the observed data \( y \), given \( X \) and \( \beta \) is Gaussian:
\[
p(y \mid X, \beta, \alpha) = \mathcal{N}(y \mid X\beta, \alpha^{-1} I),
\]
\noindent where, \( \alpha \) represents the precision of the noise. 

Using Bayes' theorem, the posterior distribution over \( \beta \), given \( X \) and \( y \), is obtained as:
\[
p(\beta \mid X, y, \alpha, \lambda) = \mathcal{N}(\beta \mid \mu_\beta, \Sigma_\beta),
\]

\noindent where, mean \( \mu_\beta \) and covariance \( \Sigma_\beta \) are given by:

\[
\mu_\beta = \alpha \Sigma_\beta X^T y
\quad 
\Sigma_\beta = (\alpha X^T X + \lambda I)^{-1},
\]

\noindent \( \lambda \) and \( \alpha \) are the hyper-parameters of Bayesian Ridge Regression normally with a $\gamma$ distribution prior.

Bayesian Ridge Regression follows an iterative Bayesian update process, where the posterior at each step serves as the prior for the next iteration \citep{tipping2001sparse}, \citep{mackay1992bayesian}. We extend this approach by enforcing a Sign Entropy prior dynamically during optimization. Instead of using a Gaussian prior that does not enforce sign stability, we enforce the Sign Entropy prior to refine the feature set at each iteration based on the posterior distribution of the coefficients. This ensures that only stable features contribute to learning in subsequent iterations. The sparsity enforcing Sign Entropy prior in our approach acts as a structured regularization mechanism for capturing stable/consistent coefficient estimates. The proposed Sign Entropy Regularization is further discussed in detail in the next sub-section.

\subsection{Sign Entropy Regularization}
\label{sub:sign_entropy_regularization}
 
\noindent For a given $j^{th}$ coefficient \( \beta_j \), the variance \( \sigma_j^2 \) is given by:
\[
\sigma_j^2 = \Sigma_\beta[j, j]
\]

\noindent As the posterior distribution of \( \beta_j \) is \( \mathcal{N}(\beta_j \mid \mu_\beta, \Sigma_\beta) \), we can calculate the probability that \( \beta_j \) is positive ($p^+$) as follows:
\begin{align*}
p^+ &= P(\beta_j > 0) = 1 - P(\beta_j \leq 0) \\
    &= 1 - \Phi\left(-\frac{\mu_j}{\sigma_j}\right) %= \Phi\left(\frac{\mu_j}{\sigma_j}\right)
\end{align*}

\noindent where, \( \mu_j \) is the posterior mean and \( \sigma_j \) is the variance of of \( \beta_j \), and \( \Phi \) is the Cumulative Distribution Function (CDF) of the standard normal distribution.
% with standard deviation \( \sigma_j = \sqrt{\sigma_j^2} \),

The Sign Entropy \(H(\beta_{j})\) is computed using \( p^+ \) and \( p^-\) (i.e., \( p^- = 1 - p^+ \)) as below:
\[
H(\beta_{j}) = -p^{+} \log_2(p^{+}) - p^{-} \log_2(p^{-}),
\]
\noindent where, \( p^{+} \) is the estimated probability that \( \beta_{j} \) is positive and \( p^{-} = 1 - p^{+} \) is the estimated probability that \( \beta_{j} \) is negative. A high value of Sign Entropy indicates that the coefficient’s sign has a high probability of flipping.

The Sign Entropy prior applied on the coefficients at each iteration enforces sparsity by eliminating features with high entropy:
\[
\mathcal{F}^{(t+1)} = \mathcal{F}^{(t)} \setminus \{ j \mid H(\beta_j) > \zeta \}
\]
where, \( \mathcal{F}^{(t)} \) represents the set of active features in a particular iteration \( t \), \(\setminus\) denotes set minus, and features with high Sign Entropy \( H(\beta_j) > \zeta \) are eliminated from the model in the next iteration of the optimization process, and \(\zeta\) is a hyper-parameter representing the highest acceptable threshold for Sign Entropy (details in \Cref{sec:map-objective}).




\begin{comment}
\subsection{Sign Entropy Regularization}
\label{sub:sign_entropy_regularization}

For the $j^{th}$ superpixel in an explanation which shows flipping of sign (as seen in \Cref{fig:lime_inconsistency}), we estimate the Sign Entropy as below:

\[
H(\beta_{j}) = -p^{+} \log_2(p^{+}) - p^{-} \log_2(p^{-}),
\]

\noindent where \( p^{+} \) is the estimated probability that \( \beta_{j} \) is positive and \( p^{-} = 1 - p^{+} \) is the estimated probability that \( \beta_{j} \) is negative. A high value of Sign Entropy indicates that the coefficient’s sign has a high probability of flipping.

Consider an image $I$, for which a number of superpixels, i.e., coefficient $j$ \footnote{coefficient $j$ represents the impact of the $j$\_{th} superpixel on the output probability.} which has a contribution to output probability given by \( p \), we estimate the Sign Entropy to quantify the uncertainty (i.e., flipping between positive and negative or  \( p^+ \) and \( p^-\) ) in a coefficient's sign using the posterior distribution of the coefficients. 

\noindent For a given $j^{th}$ coefficient \( \beta_j \), the variance \( \sigma_j^2 \) is given by:
\[
\sigma_j^2 = \Sigma_\beta[j, j]
\]

\noindent As the posterior distribution of \( \beta_j \) is \( \mathcal{N}(\beta_j \mid \mu_\beta, \Sigma_\beta) \), we can calculate the probability that \( \beta_j \) is positive ($p^+$) as follows:
\begin{align*}
p^+ &= P(\beta_j > 0) = 1 - P(\beta_j \leq 0) \\
    &= 1 - \Phi\left(-\frac{\mu_j}{\sigma_j}\right) %= \Phi\left(\frac{\mu_j}{\sigma_j}\right)
\end{align*}

\noindent Where, \( \mu_j \) is the posterior mean of \( \beta_j \) and \( \Phi \) is the Cumulative Distribution Function (CDF) of the standard normal distribution.
% with standard deviation \( \sigma_j = \sqrt{\sigma_j^2} \),

Using \( p^+ \) and \( p^-\) (i.e., \( p^- = 1 - p^+ \)), we calculate the Sign Entropy as:
\[
H(\beta_{j}) = -p^{+} \log_2(p^{+}) - p^{-} \log_2(p^{-}),
\]
\noindent where \( p^{+} \) is the estimated probability that \( \beta_{j} \) is positive and \( p^{-} = 1 - p^{+} \) is the estimated probability that \( \beta_{j} \) is negative. A high value of Sign Entropy indicates that the coefficient’s sign has a high probability of flipping.

We then apply the following Sign Entropy prior to enforce sparsity by eliminating features with high entropy:
\[
\mathcal{F}^{(t+1)} = \mathcal{F}^{(t)} \setminus \{ j \mid H(\beta_j) > \zeta \}
\]
where \( \mathcal{F}^{(t)} \) represents the set of active features (or superpixels) at iteration \( t \), $\setminus$ denotes set minus, and features with high Sign Entropy \( H(\beta_j) > \zeta \) are eliminated from the model in the next iteration of the optimization process.

When applied to a linear model, Sign Entropy Regularization eliminates unstable coefficients and forces it to learn only from stable features. The robust coefficients to sign flips render the model more consistent, making it suitable for explainability. Additionally, as mentioned earlier \Cref{sec:contribution}, the relative ranks of the coefficients also stabilize as a side effect of our regularization, further enhancing explainability.

\begin{figure}[htp]
\centering
\includegraphics[width=0.49\textwidth]{figures/combined_plot.png}
\caption{Distribution of ASFE and RMSE scores of the proposed method and other methods for Housing Price and Energy appliances datasets. The proposed method achieves much lower ASFE score while maintaining comparable RMSE with other methods.}
\label{fig:asfe_rmse_regularization}
\end{figure}
\end{comment}

\subsection{Evaluation of Sign Entropy Regularization}
We first validate the proposed Sign Entropy regularization by comparing it with a family of other approaches with different regularization strategies. We compared our approached with frequentist Lasso \citep{tibshirani1996regression}, Ridge \citep{hoerl1970ridge}, Bayesian Ridge \citep{mackay1992bayesian} \cite{tipping2001sparse}, Automatic Relevance Determination (ARD) \citep{mackay1992bayesian}, \citep{salakhutdinov2024lecture2} and Ordinary Least Squares (OLS) \citep{kutner2005applied}. We compute Average Sign Flip Entropy (ASFE) \citep{Bora_2024_CVPR} and Root Mean Square Error (RMSE) for all approaches to establish the efficacy of our newly proposed Sign Entropy-based regularization on two public datasets: House Prices - Advanced Regression Techniques dataset from Kaggle \citep{kaggle_houseprice_competition} and Appliance Energy Prediction dataset \citep{candanedo2017data} from the UCI repository. We used the implementation from Scikit-learn \citep{scikit-learn-full} for the SOTA approaches and we wrote our code\footnote{\url{https://github.com/rebathip/BELIEF.git}} in python. %We will make all our code publicly available.

\begin{figure}[h]
\centering
\includegraphics[width=0.49\textwidth]{figures/combined_plot.png}
\caption{Distribution of ASFE and RMSE scores of the proposed method and other methods for Housing Price and Energy appliances datasets. The proposed method achieves much lower ASFE score while maintaining comparable RMSE with other methods.}
\label{fig:asfe_rmse_regularization}
\end{figure}

We evaluate LASSO, Ridge, Bayesian Ridge, and the proposed method with different settings of the regularization hyper-parameter ($\alpha=0.1, 0.5, 1$ in case of LASSO and Ridge and $\lambda_{init}=0.1, 0.5, 1$ for Bayesian Ridge)\footnote{OLS does not have a regularization term and ARD does not have $\lambda_{init}$ hyper-parameter.}. This enabled us to compare the proposed regularization method with other methods at different regularization strengths. We then computed the ASFE and the RMSE\footnote{We normalized RMSE to a scale of [0,1] using min-max scaling.} metrics by performing five-fold cross validation with five repeats. As noted from \Cref{fig:asfe_rmse_regularization}, the proposed regularization scheme achieves low ASFE compared to all other approaches indicating stability/consistency of coefficient sign. Further, we note no significant loss of RMSE as compared to other approaches. The ASFE score for the proposed method outperformed other methods by a large margin which can be observed from the low overlap of the ASFE scores. For ascertaining that our method does not impact the predictive power, we conducted Two-sample Kolmogorov-Smirnov (KS) test \citep{hodges1958significance} on the distribution of RMSE scores of our proposed method with the other methods. We used KS test owing to its non-parametric nature.  Null Hypothesis $H_{0}$ was that the two distributions are identical and the alternate hypothesis was that they are not identical. The p-values (refer to supplementary for details \Cref{tab:ks_test}) from the tests were much higher than the commonly accepted threshold of 0.05 providing insufficient statistical evidence to reject the Null Hypothesis $H_{0}$. Thus, we see that our method achieves high stability in terms of coefficients' sign flips while retaining comparable predictive power.

%Intrigued by the general decrease in ASFE and a small increase in RMSE, we further quantify the magnitude of decrease in ASFE and increase in RMSE by studying the effect size. We make use of Cliff's Delta \cite{} to analyze this in a non-parametric manner. As noted from \Cref{fig:cliffs_delta}, we see that the decrease in ASFE is much higher (close to 1) as compared to the increase in RMSE hovering around $0$. Further, we see that in some scenarios our proposed regularization has a lower RMSE as compared to other approaches. The analysis confirms a high decrease in ASFE with a minimal increase in RMSE suggesting the approach as a better alternative for building explainable models with consistent explanations.