\newpage
\clearpage
\onecolumn
\renewcommand{\thesection}{S\arabic{section}}
\renewcommand{\thefigure}{S\arabic{figure}}
\renewcommand{\thetable}{S\arabic{table}}
\setcounter{figure}{0}
\setcounter{table}{0}
\setcounter{section}{0}
\setcounter{page}{1}
\title{BELIEF - Bayesian Sign Entropy Regularization for LIME Framework\\(Supplementary Material)}
\maketitle

\appendix

\section{Definitions}
\label{sup:definitions}
\subsection{Overview of LIME}
\label{subsection: Overview-LIME}
LIME is a popular post-hoc model agnostic method for interpreting the predictions of complex machine learning models \citep{ribeiro2016should}. LIME approximates a complex model locally with a simpler, transparent model (like linear regression or decision trees) called a surrogate model. This surrogate model, since it is transparent, is used to explain individual predictions in the locality. Mathematically, LIME solves the optimization problem as below:
\begin{equation}
\min_{g \in \mathcal{G}} \ \mathcal{L}(f, g, \pi_x) + \Omega(g)
\end{equation}
\noindent where, \(f(x)\) is the prediction of the complex model, for instance, \(x\),  \(g(x')\) is the prediction of the surrogate model for a representation \(x'\) of instance \(x\), \(\pi_x(z)\) is a proximity measure between instance \(x\) and \(z\) and \(\mathcal{L}(f, g, \pi_x)\) is a measure of how unfaithfully \(g\) approximates \(f\) in the vicinity of \(x\), weighted by the proximity measure \(\pi_x(z)\) and \(\Omega(g)\) is a measure of the complexity of the surrogate model.

LIME's optimization aims to find a surrogate model \(g\) that approximates the complex model \(f\) in the neighbourhood of \(x\) and is transparent in nature.
The most important aspects in LIME are the choice of the representation \(x'\) and the measure of locality \(\pi_x(z)\). The authors use a binary vector \(x'\) indicating the presence or absence of interpretable components (like words in text or superpixels/segments in images).  Further, a weight function is used to give higher weight to instances that are closer to \(x\). This weight function uses an exponential kernel, i.e. \(\pi_x(z) = \exp(-Dist(x, z)^2/\sigma^2)\), where \(Dist(x, z)\) is the cosine distance between \(x\) and \(z\), and \(\sigma\) is a kernel width parameter.

In the context of explaining images, this involves transforming the problem from image to a tabular format. The process has the following main steps, viz. (1) Divide the image into superpixels or segments using segmentation, (2) Generate random perturbation vectors with length equal to a number of superpixels, (3) Perturbing the superpixels and noting the predictions (output probability) (4) Building a surrogate model with perturbation vectors as X and predictions from step 3 as y, and (5) extracting explanations from the surrogate model. This transformation of the problem statement from images into a tabular format is the vital part of how LIME generalizes the extraction of explanation to image classification models.

\subsection{Evaluation Metrics}
%\subsection{Proposed Consistency Metric}
%\label{subsection: Proposed_Stability_Metrics}
We use the same consistency metrics as mentioned in \citep{Bora_2024_CVPR}. \cite{Bora_2024_CVPR} propose two consistency evaluation metrics to address the two aspects of consistency i.e.,  coefficients' sign flips and the variance in importance ranks of the coefficients of the surrogate model. These metrics are defined as below:
\begin{enumerate}
  \item \textbf{Average Sign Flip Entropy (ASFE)}:  This metric measures the variability in the sign of a superpixel across multiple runs. A lower value of ASFE indicates that the concerned superpixel has lower probability of sign flips across multiple runs. ASFE for model `M' and explanation technique `xp' is calculated as below:
    \begin{equation}
    ASFE_{M}^{xp} = \frac{1}{n} \sum_{i=1}^{n} H(\text{{sign}}_{i})
    \end{equation}
    \noindent where, 
    \[ H(\text{{sign}}_{i}) = - p_i^{+} \log_{2} (p_i^{+}) - p_i^{-} \log_{2}(_ip^{-}) \]

where, $H(\text{{sign}}_{i})$ is the sign entropy of the ${i}^{th}$ superpixel. The  the probabilities of the ${i}^{th}$ superpixel to be positive or negative are denoted by the terms \(p_i^{+}\) and \(p_i^{-}\) respectively. Kernel Density Estimation (KDE), owing to its non-parametric nature, is used to estimate \(p_i^{+}\) and \(p_i^{-}\) for each of the `n' superpixels. For bandwidth selection Scott's method is used \citep{scott2015multivariate}. $ASFE_{Model}^{xp}$ can range between [0,1] such that 0 represents no sign flips and 1 denotes 50\% probability of the coefficient to be positive i.e. high sign flips.

  \item \textbf{Average Rank Similarity (ARS)}: This measure quantifies the consistency in the importance ranks of superpixels across multiple runs. A higher ARS score indicates that the importance ranks of the superpixels have more agreement across multiple runs than a lower ARS score. Rank Biased Overlap (RBO) score \citep{webber2010similarity} is used to calculate the ARS across different runs for model `M' and explanation technique `xp' as per the equation below.
\begin{equation}
ARS_{M}^{xp} = \frac{\sum_{i=1}^{m-1} \sum_{j=i+1}^{m} rbo\_ext(\boldsymbol{R_i}, \boldsymbol{R_j})}{\binom{m}{2}}
\label{eqn:eqn-ars}
\end{equation}

Here, $\boldsymbol{R_i}$ and $\boldsymbol{R_j}$ denote the ranked coefficient vectors obtained from the $i$-th and $j$-th runs, respectively. The function $rbo_ext(\boldsymbol{R_i}, \boldsymbol{R_j})$ calculates the extrapolated Rank-Biased Overlap (RBO) score between these ranked vectors. To compute the RBO scores, we utilized the Python package `rbo' \citep{Chen2023}, configuring the persistence parameter ($p$) to 0.2 to assign greater emphasis to the highest-ranked elements. The denominator term $\binom{m}{2}$ represents the total number of distinct rank list pairs, ensuring that the rank similarities are averaged across all rank list pairs. The metric $ARS_{Model}^{xp}$ varies between 0 and 1, where a value of 1 signifies a perfect agreement in superpixel rankings across runs, whereas 0 indicates a complete lack of correspondence.

\item \textbf{Combined Consistency Metric (CCM)}: Sign entropy and variance the importance ranks of superpixels are quatified by the metrics ASFE and ARS respectively. Bora et. al., thus combined both into a consolidated metric to understand and evaluate an XAI system. The combined metric, CCM, is defined as:

\begin{equation}
CCM_{M}^{xp} = (1-ASFE_{M}^{xp})*ARS_{Model}^{xp}
\label{eqn:eqn-10}
\end{equation}
$CCM_{M}^{xp}$ ranges between [0,1] where 0 denotes low consistency and 1 denotes full consistency in both sign entropy and superpixel importance ranks.
\end{enumerate}


\subsection{Adapted Area Over Perturbation Curve}
We employ the adapted Area Over Perturbation Curve (AOPC), introduced in \cite{Bora_2024_CVPR}, to assess the fidelity of explanations generated by LIME, BayLIME, SLICE, and BELIEF. Originally proposed by \cite{samek2016evaluating} as an enhancement of the method by \cite{bach2015pixel}, AOPC quantifies the reduction in predicted probability ($\hat{Y}$) as an image undergoes perturbation, where in our case, perturbations are applied to superpixels based on their ranked importance. While AOPC was initially formulated for deletion, it has since been adapted for insertion as well. The modified AOPC metric is formally defined below.
\begin{equation}
\label{eqn:aopc}
    AOPC_d = \frac{1}{L+1} \left \langle \sum_{k=1}^{L} \Delta f(x, k) \right \rangle_{p(x)}
\end{equation}

\noindent where, the term $\Delta f(x, k)$ represents the variation in the classifier’s output probability after $k$ perturbation steps, either as an increase or a decrease. For deletion of positive superpixels or insertion of negative superpixels, it is computed as $f(x^{(0)}) - f(x^{(k)})$, where $x^{(0)}$ denotes the original, unperturbed image. Conversely, for insertion of positive superpixels or deletion of negative superpixels, $\Delta f(x, k)$ is defined as $f(x^{(k)}) - f(x^{(0)})$, where $x^{(0)}$ corresponds to the fully perturbed (i.e., blurred) image. The level of blurring is determined using the Adaptive-blur technique from \cite{Bora_2024_CVPR}.

Further, $x^{(k)}$ refers to the image at step $k$ during the insertion process, whereas in the deletion process, it represents the progressively restored image after $k$ superpixels from the original image have been reintroduced into the blurred background. The total number of perturbation steps is denoted by $L$. The notation $\langle . \rangle_{p(x)}$ indicates the expectation over all dataset images, enabling the computation of the average AOPC score across a deep learning model’s predictions.

Additionally, $d$ represents the pixel removal strategy, which can follow either the Most Relevant First (MoRF) or the Least Relevant First (LeRF) order. Since our evaluation involves all superpixels, the insertion and deletion procedures yield identical results. Thus, we conducted all experiments using the MoRF strategy and refer to the computed metric simply as AOPC. As AOPC measures the difference in predicted probabilities between the initial and modified images, a higher AOPC score for both insertion and deletion indicates stronger fidelity. This contrasts with traditional insertion and deletion metrics, where higher insertion AUC and lower deletion AUC signify greater fidelity.

\begin{comment}
\section{Algorithm}
\label{sup:algorithm}
\begin{algorithm}
\caption{Bayesian Ridge Regression with Sign Entropy Regularization}
\label{alg:bayesian_ridge_fit}
\begin{algorithmic}[1]
\Require $X, y, \text{sample\_weight}$ (Optional)
\State Initialize feature selection mask $\mathbf{m} \gets \mathbf{0}$ (all features active)
\State Initialize $\alpha, \lambda$ if not provided
\State Preprocess $X, y$ and apply rescaling if sample weights exist
\State Compute $X^T y$
\State Initialize $\text{skip\_counter} \gets \text{skip\_iters}$, $\text{tol\_counter} \gets 0$
\State $\mathbf{\beta}_{\text{old}} \gets \mathbf{0}$, $\mathbf{\beta}_{\text{full}} \gets \mathbf{0}$
\For{$\text{iter} = 1$ to $\text{max\_iter}$}
    \State $X_{\text{selected}} \gets X[:, \mathbf{m} == 0]$ \Comment{Select retained features only}
    \If{$\text{use\_sign\_entropy\_elimination}$ and $\text{iter} \geq \text{skip\_counter}$ and $\text{tol\_counter} < \text{tol}$}
        \State Compute SVD: $U, S, V^T = \text{SVD}(X_{\text{selected}})$
        \State Compute coefficients: $\mathbf{\beta}, \text{rmse} \gets \text{UpdateCoef}(X_{\text{selected}}, y)$
        \State Compute covariance: $\sigma \gets \frac{1}{\alpha} V^T (V / (S^2 + \lambda / \alpha))$
        \State Compute sign entropy for each feature
        \State Identify high-entropy features: $\mathcal{F}_{\text{elim}} \gets \{ j | H(\beta_j) > \zeta \}$
        \If{$\mathcal{F}_{\text{elim}} = \emptyset$}
            \State $\text{tol\_counter} \gets \text{tol\_counter} + 1$
        \Else
            \State $\text{tol\_counter} \gets 0$
            \State Mark eliminated features: $\mathbf{m}[\mathcal{F}_{\text{elim}}] \gets 1$
            \State Update $\text{skip\_counter} \gets \text{iter} + \text{skip\_iters} + 1$
        \EndIf
        \State $X_{\text{selected}} \gets X[:, \mathbf{m} == 0]$ \Comment{Update feature set}
    \EndIf
    \State Compute SVD and update coefficients $\mathbf{\beta}$
    \State Store full coefficient vector: $\mathbf{\beta}_{\text{full}}[\mathbf{m} == 0] \gets \mathbf{\beta}$
    \State Set eliminated coefficients: $\mathbf{\beta}_{\text{full}}[\mathbf{m} == 1] \gets -0.0$
    \If{converged}
        \State \textbf{break}
    \EndIf
    \State Compute marginal likelihood score if required
    \State Update $\alpha$ and $\lambda$
\EndFor
\State \Return $\mathbf{\beta}_{\text{full}}, \alpha, \lambda$
\end{algorithmic}
\end{algorithm}    
\end{comment}

\section{MAP Objective with Iterative Sign Entropy Prior}
\label{sec:map-objective}

Our method introduces a prior over coefficients, one that is not based on magnitude alone, but instead constructed using both the mean and variance of each coefficient’s posterior distribution. This prior reflects a more Bayesian treatment by taking into account the full distributional behavior (both mean and variance) of the coefficients.

Carroll et.al. \citep{carroll2009bayesregression}, described the MAP objective for Bayesian Ridge Regression \(\hat{\beta}_{\text{MAP}}\) assuming a Gaussian likelihood with homoscedastic noise of \(
y_n \mid x_n, \beta \sim \mathcal{N}(x_n^\top \beta, \sigma^2) \) and a zero-mean Gaussian prior \(\beta_j \sim \mathcal{N}(0, \tau^2)\) over each of \(d\) coefficients as below:


\[
\hat{\beta}_{\text{MAP}} = \arg\min_{\beta} \left[
\frac{1}{2\sigma^2} \sum_{n=1}^{N} (y_n - x_n^\top \beta)^2 +
\frac{1}{2\tau^2} \sum_{j=1}^{d} \beta_j^2
\right]
\]

Letting \( \lambda_1 = \frac{1}{2\tau^2} \), we rewrite this as:

\[
\hat{\beta}_{\text{MAP}} = \arg\min_{\beta} \left[
\frac{1}{2\sigma^2} \sum_{n=1}^{N} (y_n - x_n^\top \beta)^2 +
\lambda_1 \sum_{j=1}^{d} \beta_j^2
\right]
\]

\textbf{Sign Entropy Prior (Our Contribution):}  
Unlike traditional priors which penalize \(\beta_j\) based on  magnitude alone (i.e., mean), we propose to penalize \(\beta_j\) by using both its mean and variance. We augment the model with the proposed prior that iteratively and adaptively penalizes coefficients based on the their Sign Entropy which is computed using the coefficient's posterior distribution. The proposed iterative prior update resembles the Empirical Bayesian methods where hyper-parameters are refined using the posterior information of the previous iteration \citep{tipping2001sparse}. 

After iteration \( t-1 \), the posterior of each coefficient can be approximated as:

\[
\beta_j \sim \mathcal{N}(\mu_j^{(t-1)}, {(\sigma_j^2)}^{(t-1)})
\]

\noindent
We define the Sign Entropy as:
\[
\mathcal{H}(\mu_j, \sigma_j) = -p_j \log p_j - (1 - p_j) \log (1 - p_j),
\quad \text{where } p_j = \Phi\left(0; \mu_j, \sigma_j\right)
\]


\noindent 
The Sign Entropy prior at \( t^{th} \) iteration is given as:

\begin{equation}
\label{eqn:sign_entropy_prior}
\pi(\beta_j^{(t)}) \propto \exp\left(-\lambda_2 \cdot \mathcal{H}(\mu_j^{(t-1)}, \sigma_j^{(t-1)})\right)    
\end{equation}

\noindent
Although the Sign Entropy prior in Equation \ref{eqn:sign_entropy_prior} is defined as a proportional relationship, due to the bounded nature of \(\mathcal{H}(\mu_j, \sigma_j) \) between [0,1] (with log base 2), it can be normalized over a finite domain of \(\beta_j\). 



We solve the standard MAP objective on a reduced set of features, where \(A^{t}\), the active set at iteration \( t \) is defined by a threshold on sign entropy:

\[
\mathcal{A}^{(t)} = \left\{ j \in \{1, \dots, d\} \,\middle|\, \mathcal{H}(\mu_j^{(t-1)}, \sigma_j^{(t-1)}) \leq \zeta \right\}
\]

\noindent
We then solve the MAP problem over this active set \(\mathcal{A}^{(t)}\) as below:

\begin{equation}
\label{eqn:sign_entropy_map}
\hat{\beta}^{(t)} = \arg\min_{\beta_j = 0 \text{ for } j \notin \mathcal{A}^{(t)}} \left[
\frac{1}{2\sigma^2} \sum_{n=1}^{N} (y_n - x_n^\top \beta)^2 +
\lambda_1 \sum_{j \in \mathcal{A}^{(t)}} \beta_j^2
\right]    
\end{equation}


This Sign Entropy prior penalizes coefficients whose sign is inconsistent, and acts as a feedback-based prior, reducing sign entropy of coefficients in subsequent iterative updates. The resulting MAP objective is dynamic and evolves during optimization, leading to improved stability in the sign of the coefficients. While traditional priors like Ridge or Lasso penalize based on the coefficient value alone (and not variance), our Sign Entropy prior incorporates posterior uncertainty (i.e., mean and variance of the coefficients) of previous iteration by penalizing sign inconsistency. 


\section{Additional Plots}

\begin{figure}[htp]
\centering
\includegraphics[width=0.80\textwidth]{figures/slice_newfoundland_181.jpg}
\caption{Figure showing the top five positive and negative superpixels of explanations using BELIEF (proposed method) for a random image of the Oxford-IIIT Pets dataset with Inception V3 model for four different runs. The predicted class was Newfoundland, and the prediction probability was 0.46. Blue and red colors denote positive and negative superpixels, and the numbers inside the superpixels specify their importance and rank. There is no inconsistency of superpixels sign i.e., a superpixel deemed as positive in one run is not marked as negative in another and vice-versa. Further, the superpixel importance ranks for both positive and negative superpixels remain stable across all runs.}
\label{fig:slice_belief_consistency}
\end{figure}

\begin{figure}[htp]
\centering
\includegraphics[width=0.60\textwidth]{figures/cliff_both.png}
\caption{Distribution of effect sizes for Cliff's Delta of ASFE gain and RMSE Loss for the proposed Sign Entropy regularization compared to other well-known approaches for both Energy and Housing datasets. ASFE gain is the decrease in ASFE score (i.e. improvement in Sign Entropy of the coefficients) and RMSE loss is the increase in RMSE score (i.e. the increase in the RMSE of the Linear Regression model). The effect size for ASFE gain is almost always positive and high ('1') except for two cases. The effect size of RMSE loss is either very low or negative. This indicates that our proposed regularization can reduce the sign entropy significantly while keeping the RMSE comparable. We do additional statistical tests to confirm our claims. Please refer \Cref{tab:ks_test} for details of the conducted statistical tests.}
\label{fig:cliffs_delta}
\end{figure}

\begin{comment}
\begin{figure}[!htp]
\centering
\includegraphics[width=0.45\textwidth]{figures/ecdf_ccm_belief_slice_lime_baylime.pdf}
\caption{ECDF plot of CCM Scores for BELIEF, LIME, BayLIME and  SLICE (higher score is better)}
\label{fig:ecdf_ccm}
\end{figure}    
\end{comment}


\begin{figure*}[htp]
\centering
\begin{minipage}[b]{0.45\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figures/ecdf_ccm_belief_slice_lime_baylime.pdf}
    \caption{ECDF plot of CCM Scores for BELIEF, LIME, BayLIME and  SLICE (higher score is better)}
    \label{fig:ecdf_ccm}
\end{minipage}
\hfill
\begin{minipage}[b]{0.45\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figures/ecdf_ccm_ablation.png}
    \caption{ECDF plot of CCM Scores for BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE and  SLICE (higher is better)}
    \label{fig:ablation_study}
\end{minipage}
\end{figure*}



\begin{figure*}[htp]
\centering
\begin{minipage}[b]{0.45\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figures/ccm_belief_slice_lime_baylime.pdf}
    \caption{Distribution of CCM Scores for BELIEF, LIME, BayLIME, and SLICE (higher is better).}
    \label{fig:combined_score}
\end{minipage}
\hfill
\begin{minipage}[b]{0.45\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figures/ccm_belief_belief-fe_slice_sliceblur_sliceblurfe.pdf}
    \caption{Distribution of CCM Scores for BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE, and SLICE (higher is better).}
    \label{fig:ablation_study_density}
\end{minipage}
\end{figure*}

\begin{comment}
\begin{figure}[!h]
\centering
\includegraphics[width=0.45\textwidth]{figures/ecdf_ccm_ablation.png}
\caption{ECDF plot of CCM Scores for BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE and  SLICE (higher is better)}
\label{fig:ablation_study}
\end{figure}
    
\end{comment}


\begin{figure*}[htp]
\centering
\begin{subfigure}{.48\textwidth}
  \centering
  \includegraphics[width=0.9\textwidth]{figures/aopc_del.png}
  \caption{ECDF plots of AOPC deletion scores}
  \label{fig:aopc_del}
\end{subfigure}%
\begin{subfigure}{.48\textwidth}
  \centering
  \includegraphics[width=0.9\textwidth]{figures/aopc_ins.png}
  \caption{ECDF plots of AOPC insertion scores}
\label{fig:aopc_ins}
\end{subfigure}
\caption{ECDF plots of AOPC (Higher AOPC indicates higher fidelity)}
\label{fig:ecdf-plots-aopc-ins-del}
\end{figure*}

\begin{figure*}[htp]
\centering
\begin{subfigure}{.48\textwidth}
  \centering
    \includegraphics[width=0.95\textwidth]{figures/del.png}
    \caption{ECDF plots of Deletion AUC.(Lower is better)}
\label{fig:del_auc}
\end{subfigure}%
\begin{subfigure}{.48\textwidth}
  \centering
    \includegraphics[width=0.95\textwidth]{figures/ins.png}
    \caption{ECDF plots of Insertion AUC.(Higher is better)}
\label{fig:ins_auc}
\end{subfigure}
\caption{ECDF plots of Deletion and Insertion AUC}
\label{fig:ecdf-plots-ins-del}
\end{figure*}

\section[Sensitivity Analysis of zeta]{Sensitivity Analysis of hyper-parameter $\zeta$}

\begin{table}[h]
\centering
\caption{Mean CCM scores for different values of $\zeta$ on the Oxford-IIIT Pets dataset.}
\label{tab:ccm_scores_sensitivity}
\begin{tabular}{|c|c|}
\hline
\textbf{$\zeta$ Value} & \textbf{Mean CCM Score} \\
\hline
0.01 & 0.958 \\
0.1  & 0.915 \\
0.5  & 0.893 \\
0.9  & 0.880 \\
1.0  & 0.851 \\
\hline
\end{tabular}
\end{table}

\begin{figure}[htbp]
  \centering
  % ----- scale the whole mosaic (graphics + captions) -----
  \begin{adjustbox}{width=.77\linewidth}

    \begin{minipage}{\linewidth}
      \centering

      % ---------- first row ----------
      \begin{subfigure}[t]{0.49\linewidth}
        \centering
        \includegraphics[width=\linewidth]{rebuttal_pics/1-asfe_ecdf_zeta0to6.png}
        \caption{ECDF plot of (1 – ASFE) scores across different $\zeta$ variants for the \textbf{ResNet-50} model evaluated on 50 randomly selected images from the \textbf{Oxford-IIIT Pets} dataset.}
        \label{fig:1-asfe_zeta_sensitivity}
      \end{subfigure}\hfill
      \begin{subfigure}[t]{0.49\linewidth}
        \centering
        \includegraphics[width=\linewidth]{rebuttal_pics/arsc_ecdf_zeta0to6.png}
        \caption{ECDF plot of \textbf{ARS} scores across different $\zeta$ variants for the ResNet-50 model on the same 50-image sample of the Oxford-IIIT Pets dataset.}
        \label{fig:ars_zeta_sensitivity}
      \end{subfigure}

      \vspace{0.6em} % small vertical gap

      % ---------- second row ----------
      \begin{subfigure}[t]{0.49\linewidth}
        \centering
        \includegraphics[width=\linewidth]{rebuttal_pics/ccm_ecdf_zeta0to6.png}
        \caption{ECDF plot of \textbf{CCM} scores across different $\zeta$ variants for the ResNet-50 model on 50 Oxford-IIIT Pets images.}
        \label{fig:ccm_zeta_sensitivity}
      \end{subfigure}\hfill
      \begin{subfigure}[t]{0.49\linewidth}
        \centering
        \includegraphics[width=\linewidth]{rebuttal_pics/mean_fraction_selected_per_zeta.jpeg}
        \caption{Bar plot showing the \textbf{average (mean) ratio of selected segments} obtained with different $\zeta$ variants for the ResNet-50 model on the same image set.}
        \label{fig:sel_features_zeta_sensitivity}
      \end{subfigure}

    \end{minipage}
  \end{adjustbox}

  \caption{Sensitivity analysis for hyper-parameter $\boldsymbol{\zeta}$ of BELIEF on ResNet-50 with 50 Oxford-IIIT Pets images.}
  \label{fig:four-panel}
\end{figure}


In this section we show the results of sensitivity analysis of the hyper-parameter $\zeta$, for the Oxford-IIIT Pets Dataset images on the ResNet50 model. In the plots \Cref{fig:1-asfe_zeta_sensitivity} and \Cref{fig:ars_zeta_sensitivity}, the quantities (1–ASFE) and ARSC decrease as 
 $\zeta$ increases. This leads to an overall decrease in CCM scores with increasing $\zeta$, as shown in \Cref{tab:ccm_scores_sensitivity} and the ECDF plot of \Cref{fig:ccm_zeta_sensitivity}.





\begin{comment}
\begin{figure}[htbp]
  \centering
  % ---------- first row ----------
  \begin{subfigure}[t]{0.45\linewidth}
    \centering
    \includegraphics[width=\linewidth]{rebuttal_pics/1-asfe_ecdf_zeta0to6.png}%
    \caption{ECDF plot of (1 - ASFE) scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:1-asfe_zeta_sensitivity}
  \end{subfigure}\hfill
  \begin{subfigure}[t]{0.45\linewidth}
    \centering
    \includegraphics[width=\linewidth]{rebuttal_pics/arsc_ecdf_zeta0to6.png}%
    \caption{ECDF plot of ARS scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:ars_zeta_sensitivity}
  \end{subfigure}

  \vspace{0.5em} % small vertical gap

  % ---------- second row ----------
  \begin{subfigure}[t]{0.45\linewidth}
    \centering
    \includegraphics[width=\linewidth]{rebuttal_pics/ccm_ecdf_zeta0to6.png}%
    \caption{ECDF plot of CCM scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:ecdf_zeta_sensitivity}
  \end{subfigure}\hfill
  \begin{subfigure}[t]{0.45\linewidth}
    \centering
    \includegraphics[width=\linewidth]{rebuttal_pics/mean_fraction_selected_per_zeta.jpeg}%
    \caption{Bar plot of Average (mean) Ratio of selected segments using different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:sel_features_zeta_sensitivity}
  \end{subfigure}

  \caption{Sensitivity Analysis for Hyper-parameter $\zeta$.}
  \label{fig:four-panel}
\end{figure}

\begin{figure}[h]
    \centering
    \includegraphics[width=0.6\textwidth]{figures/1-asfe_ecdf_zeta0to6.png}  % Replace with your image filename
    \caption{ECDF plot of (1 - ASFE) scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:1-asfe_zeta_sensitivity}
\end{figure}

\begin{figure}[h]
    \centering
    \includegraphics[width=0.6\textwidth]{figures/arsc_ecdf_zeta0to6.png}  % Replace with your image filename
    \caption{ECDF plot of ARS scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:ars_zeta_sensitivity}
\end{figure}


\begin{figure}[h]
    \centering
    \includegraphics[width=0.6\textwidth]{figures/ccm_ecdf_zeta0to6.png}  % Replace with your image filename
    \caption{ECDF plot of CCM scores across different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:ecdf_zeta_sensitivity}
\end{figure}

\begin{figure}[h]
    \centering
    \includegraphics[width=0.6\textwidth]{figures/mean_fraction_selected_per_zeta.jpeg}
    \caption{Bar plot of Average (mean) Ratio of selected segments using different $\zeta$ variants for Resnet50 model on 50 random images of Oxford-IIIT Pets Dataset.}
    \label{fig:sel_features_zeta_sensitivity}
\end{figure}
    
\end{comment}



Thus, the approach becomes more conservative in selecting features (segments in this case) with a propensity for sign flips as the value of $\zeta$ goes down. This is illustrated in \Cref{fig:sel_features_zeta_sensitivity}, where lowering the value of $\zeta$ leads to a decrease in the average (mean) ratio of selected segments.

Therefore, the hyper-parameter $\zeta$ should be tuned to balance the trade-off between explainability and feature retention based on the end user’s goals. We recommend that in applications where explainability is crucial, the value of $\zeta$ be set low based on the acceptable percentage of sign flips; in other situations, it can be relaxed.


\section{Details of Statistical Tests}
\label{sec:s_cc_wilcoxon}

We performed the Wilcoxon Signed Rank test to ascertain the statistical significance of our results. Additionally, we report the Common Language Effect Size (CLES), which quantifies the proportion of pairs where a value from the first distribution is greater than a value from the second distribution, with an adjustment for tied values \cite{mcgraw1992common}, \cite{vargha2000critique}. 

\begin{table}[h]
\centering
\caption{Wilcoxon Signed Rank test results for comparison of CCM scores of BELIEF, BayLIME, and LIME. Here x,y in the test column indicates the test details with x and y. Where x and y are one of B, Ba, and L denotes BELIEF, BayLIME, and LIME respectively. The null hypothesis $H_{0}$ was "The median of the differences ($CCM(x) - CCM(y)$) is equal to zero," and the alternative hypothesis $H_{a}$ was "The median of the differences ($CCM(x) - CCM(y)$) is greater than zero". D:M denotes Dataset:Model where O refers to Oxford-IIIT Pets and P refers to PASCAL VOC datasets. R denotes ResNet50 and I denotes Inception V3 models. W denotes the Test Statistic and CLES denotes the Common Language Effect Size.}
\label{tab:wilcoxon_results_ccm}
\begin{adjustbox}{width=0.43\textwidth}
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\

\hline 
B, L & O:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & O:I & 1275 & 8.9e-16 & 1.000 \\ 
B, L & O:R & 1267 & 2.2e-14 & 0.973 \\ 
B, Ba & O:R & 1227 & 2.2e-11 & 0.961 \\ 
B, L & P:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & P:I & 1275 & 8.9e-16 & 1.000 \\ 
B, L & P:R & 1275 & 8.9e-16 & 0.996 \\ 
B, Ba & P:R & 1274 & 1.8e-15 & 0.989 \\ 
\hline
\hline
\end{tabular}
\end{adjustbox}
\end{table}

\begin{table}[h]
\centering
\caption{Wilcoxon Signed Rank test results for comparison of BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE, LIME, and BayLIME for ablation study. Here x,y in the test column indicates the test details with x and y. Where x and y are one of B, Bf, Sb, Sf, L, and Ba denotes BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE, LIME, and BayLIME respectively. The null hypothesis $H_{0}$ was "The median of the differences ($CCM(x) - CCM(y)$) is equal to zero," and the alternative hypothesis was $H_{a}$ was "The median of the differences ($CCM(x) - CCM(y)$) is greater than zero". D:M denotes Dataset:Model where O refers to Oxford-IIIT Pets and P refers to PASCAL VOC datasets. R denotes ResNet50 and I denotes Inception V3 models. W denotes the Test Statistic and CLES denotes the Common Language Effect Size.}
\label{tab:wilcoxon_results_ccm_ablation}
\begin{adjustbox}{width=0.43\textwidth}
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\

\hline
B, Bf & O:I & 1275 & 8.9e-16 & 0.999 \\  
B, Sb & O:I & 1260 & 1.2e-13 & 0.925 \\ 
B, Sf & O:I & 1275 & 8.9e-16 & 0.999 \\ 
B, L & O:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & O:I & 1275 & 8.9e-16 & 1.000 \\ 

B, Bf & O:R & 1261 & 9.8e-14 & 0.966 \\  
B, Sb & O:R & 974 & 4.4e-04 & 0.652 \\ 
B, Sf & O:R & 1257 & 2.2e-13 & 0.962 \\ 
B, L & O:R & 1267 & 2.2e-14 & 0.973 \\ 
B, Ba & O:R & 1227 & 2.2e-11 & 0.961 \\ 

B, Bf & P:I & 1272 & 4.4e-15 & 0.992 \\  
B, Sb & P:I & 1267 & 2.2e-14 & 0.918 \\ 
B, Sf & P:I & 1275 & 8.9e-16 & 0.996 \\ 
B, L & P:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & P:I & 1275 & 8.9e-16 & 1.000 \\ 

B, Bf & P:R & 1275 & 8.9e-16 & 0.988 \\ 
B, Sb & P:R & 904 & 4.7e-03 & 0.615 \\ 
B, Sf & P:R & 1269 & 1.2e-14 & 0.982 \\ 
B, L & P:R & 1275 & 8.9e-16 & 0.996 \\ 
B, Ba & P:R & 1274 & 1.8e-15 & 0.989 \\ 

\hline
\hline
\end{tabular}
\end{adjustbox}
\end{table}

\begin{table}[h]
\centering
\caption{Wilcoxon Signed Rank test results for comparison of BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE, LIME, and BayLIME for ablation study. Here x,y in the test column indicates the test details with x and y. Where x and y are one of B, Bf, Sb, Sf, L, and B denotes BELIEF, BELIEF\_FE, SLICE\_blur, SLICE\_FE, LIME, and BayLIME respectively. The null hypothesis $H_{0}$ was "The median of the differences ($CCM(x) - CCM(y)$) is equal to zero," and the alternative hypothesis was $H_{a}$ was "The median of the differences ($CCM(x) - CCM(y)$) is greater than zero". D:M denotes Dataset:Model where O refers to Oxford-IIIT Pets and P refers to PASCAL VOC datasets. R denotes ResNet50 and I denotes Inception V3 models. W denotes the Test Statistic and CLES denotes the Common Language Effect Size.}
\label{tab:wilcoxon_results_ccm_BE}
\begin{adjustbox}{width=0.43\textwidth}
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\

\hline
B, Bf & O:I & 1275 & 8.9e-16 & 0.999 \\  
B, Sb & O:I & 1260 & 1.2e-13 & 0.925 \\ 
B, Sf & O:I & 1275 & 8.9e-16 & 0.999 \\ 
B, L & O:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & O:I & 1275 & 8.9e-16 & 1.000 \\ 

B, Bf & O:R & 1261 & 9.8e-14 & 0.966 \\  
B, Sb & O:R & 974 & 4.4e-04 & 0.652 \\ 
B, Sf & O:R & 1257 & 2.2e-13 & 0.962 \\ 
B, L & O:R & 1267 & 2.2e-14 & 0.973 \\ 
B, Ba & O:R & 1227 & 2.2e-11 & 0.961 \\ 

B, Bf & P:I & 1272 & 4.4e-15 & 0.992 \\  
B, Sb & P:I & 1267 & 2.2e-14 & 0.918 \\ 
B, Sf & P:I & 1275 & 8.9e-16 & 0.996 \\ 
B, L & P:I & 1275 & 8.9e-16 & 1.000 \\ 
B, Ba & P:I & 1275 & 8.9e-16 & 1.000 \\ 

B, Bf & P:R & 1275 & 8.9e-16 & 0.988 \\ 
B, Sb & P:R & 904 & 4.7e-03 & 0.615 \\ 
B, Sf & P:R & 1269 & 1.2e-14 & 0.982 \\ 
B, L & P:R & 1275 & 8.9e-16 & 0.996 \\ 
B, Ba & P:R & 1274 & 1.8e-15 & 0.989 \\ 

\hline
\hline
\end{tabular}
\end{adjustbox}
\end{table}

    
\begin{table}[htp]
\centering
\caption{Wilcoxon signed rank test results for comparison of BELIEF (B), LIME (L), and BayLIME (Ba). AOPC(x,y) indicates the test where the null hypothesis $H_{0}$ was "The median of the differences ($AOPC score(x) - AOPC score(y)$) is equal to zero," and the alternative hypothesis was $H_{a}$ was "The median of the differences ($AOPC score(x) - AOPC score(y)$) is greater than zero". [D:M denotes Dataset:Model; O refers to Oxford-IIIT Pets and P refers to PASCAL VOC datasets. R denotes ResNet50 and I denotes Inception V3 models. W denotes the Test Statistic and CLES denotes the Common Language Effect Size.}
\label{tab:wilcoxon_results_aopc}
\resizebox{0.46\textwidth}{!}{
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\
\hline
\multicolumn{5}{c}{Insertion}\\
\hline
AOPC(B,L) & O:I & 1229 & 1.7e-11 & 0.892 \\ 
AOPC(B,L) & O:R & 1040 & 2.6e-05 & 0.756 \\ 
AOPC(B,L) & P:I & 1187 & 1.5e-09 & 0.878 \\ 
AOPC(B,L) & P:R & 1098 & 1.1e-06 & 0.771 \\ 
AOPC(B,Ba) & O:I & 1188 & 1.4e-09 & 0.886 \\ 
AOPC(B,Ba) & O:R & 1057 & 1.1e-05 & 0.753 \\ 
AOPC(B,Ba) & P:I & 1171 & 6.3e-09 & 0.880 \\ 
AOPC(B,Ba) & P:R & 1028 & 4.5e-05 & 0.768 \\ 
\hline
\multicolumn{5}{c}{Deletion}\\
\hline
AOPC(B,L) & O:I & 1231 & 1.3e-11 & 0.889 \\ 
AOPC(B,L) & O:R & 1040 & 2.6e-05 & 0.758 \\ 
AOPC(B,L) & P:I & 1187 & 1.5e-09 & 0.874 \\ 
AOPC(B,L) & P:R & 1094 & 1.4e-06 & 0.775 \\ 
AOPC(B,Ba) & O:I & 1184 & 2.0e-09 & 0.879 \\ 
AOPC(B,Ba) & O:R & 1054 & 1.3e-05 & 0.753 \\ 
AOPC(B,Ba) & P:I & 1160 & 1.5e-08 & 0.876 \\ 
AOPC(B,Ba) & P:R & 1007 & 1.2e-04 & 0.768 \\ 
\hline
\hline
\end{tabular}
}
\end{table}

\begin{table}[htp]
\centering
\renewcommand{\arraystretch}{0.85}
\caption{Wilcoxon signed rank test results comparing Insertion and Deletion AUCs of BELIEF (B) with LIME (L) and BayLIME (Ba) using a greater alternative hypothesis. AUC(x,y) denotes a test with null hypothesis $H_{0}$ that the median difference in scores between x and y is zero, against an alternative hypothesis $H_{a}$ of a positive median difference. [D:M signifies Dataset:Model; O for Oxford-IIIT Pets, P for PASCAL VOC, R for ResNet50, and I for Inception V3. W represents the Test Statistic and CLES the Common Language Effect Size.]}
\label{tab:wilcoxon_results_auc}
\resizebox{0.45\textwidth}{!}{
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\
\hline
\multicolumn{5}{c}{Insertion}\\
\hline
AUC(B,L) & O:I & 1230 & 1.5e-11 & 0.898 \\ 
AUC(B,Ba) & O:I & 1190 & 1.2e-09 & 0.885 \\ 
AUC(B,L) & O:R & 1051 & 1.5e-05 & 0.767 \\ 
AUC(B,Ba) & O:R & 1076 & 4.0e-06 & 0.766 \\ 
AUC(B,L) & P:I & 1183 & 2.2e-09 & 0.872 \\ 
AUC(B,Ba) & P:I & 1169 & 7.4e-09 & 0.877 \\ 
AUC(B,L) & P:R & 1113 & 4.4e-07 & 0.773 \\ 
AUC(B,Ba) & P:R & 999 & 1.6e-04 & 0.763 \\ 
\hline
\multicolumn{5}{c}{Deletion}\\
\hline
AUC(L,B) & O:I & 1230 & 1.5e-11 & 0.894 \\ 
AUC(Ba,B) & O:I & 1186 & 1.7e-09 & 0.883 \\ 
AUC(L,B) & O:R & 1056 & 1.2e-05 & 0.767 \\ 
AUC(Ba,B) & O:R & 1068 & 6.1e-06 & 0.764 \\ 
AUC(L,B) & P:I & 1184 & 2.0e-09 & 0.872 \\ 
AUC(Ba,B) & P:I & 1156 & 2.1e-08 & 0.874 \\ 
AUC(L,B) & P:R & 1098 & 1.1e-06 & 0.775 \\ 
AUC(Ba,B) & P:R & 997 & 1.8e-04 & 0.765 \\ 
\hline
\hline
\end{tabular}
}
\end{table}


\begin{table}[htp]
\centering
\caption{Wilcoxon signed rank test results for comparison of BELIEF(B) and SLICE(S). metric(B,S) indicates the test where the null hypothesis $H_{0}$ was "The median of the differences ($metric score(\text{BELIEF}) - metric score(\text{SLICE})$) is equal to zero," and the alternative hypothesis was $H_{a}$ was "The median of the differences ($metric\ score(\text{BELIEF}) - metric\ score(\text{SLICE})$) is not equal to zero". AOPC and AUC are the metrics, D:M denotes Dataset:Model; O refers to Oxford-IIIT Pets and P refers to PASCAL VOC datasets. R denotes ResNet50 and I denotes Inception V3 models. W denotes the Test Statistic and CLES denotes the Common Language Effect Size.}
\label{tab:wilcoxon_results_belief_slice_fidelity}
\resizebox{0.46\textwidth}{!}{
\begin{tabular}{lllll}
\hline
\hline
\textbf{Test} & \textbf{D:M} & \textbf{W} & \textbf{p-value} & \textbf{CLES} \\
\hline
\multicolumn{5}{c}{AOPC Insertion}\\
\hline
AOPC(B,S) & O:I & 590 & .65 & 0.538 \\ 
AOPC(B,S) & O:R & 557 & .44 & 0.535 \\ 
AOPC(B,S) & P:I & 597 & .70 & 0.557 \\ 
AOPC(B,S) & P:R & 559 & .45 & 0.522 \\ 
\hline
\multicolumn{5}{c}{AOPC Deletion}\\
\hline
AOPC(B,S) & O:I & 589 & .65 & 0.537 \\ 
AOPC(B,S) & O:R & 567 & .50 & 0.528 \\ 
AOPC(B,S) & P:I & 596 & .69 & 0.557 \\ 
AOPC(B,S) & P:R & 548 & .39 & 0.521 \\ 
\hline
\multicolumn{5}{c}{AUC Insertion}\\
\hline
AUC(B,S) & O:I & 589 & .65 & 0.535 \\ 
AUC(B,S) & O:R & 546 & .38 & 0.546 \\ 
AUC(B,S) & P:I & 597 & .70 & 0.558 \\ 
AUC(B,S) & P:R & 555 & .43 & 0.525 \\ 
\hline
\multicolumn{5}{c}{AUC Deletion}\\
\hline
AUC(B,S) & O:I & 591 & .66 & 0.462 \\ 
AUC(B,S) & O:R & 553 & .42 & 0.460 \\ 
AUC(B,S) & P:I & 595 & .69 & 0.444 \\ 
AUC(B,S) & P:R & 547 & .39 & 0.478 \\ 
\hline
\hline
\end{tabular}
}
\end{table}


\begin{table*}[!h]
\centering
\caption{Median ASFE scores and RMSE of our proposed Sign Entropy regularization and other approaches. Lower ASFE and RMSE scores are better. OLS does not have a regularization term and ARD does not have $lambda\_init$ hyper-parameter. Therefore, we conducted the experiments without applying the regularization hyper-parameter settings (0.1, 0.5, and 1), and we denote this scenario using the same values of ASFE and RMSE for R1, R.5, and R1 in OLS and ARD.}
\label{tab:median_coss_rmse_partial}
\resizebox{0.95\textwidth}{!}{
\begin{tabular}{l|cccccc|cccccc}
\hline
\hline
& \multicolumn{6}{c|}{ASFE $\downarrow$} & \multicolumn{6}{c}{RMSE $\downarrow$}\\
\cline{2-13}
{M} & {Proposed} & {Lasso} & {Ridge} & {Bayesian} & {ARD} & {OLS} & {Proposed} & {Lasso} & {Ridge} & {Bayesian} & {ARD} & {OLS}\\
 &  &  &  & {Ridge} &  &  &  &  &  & {Ridge} &  & \\
\hline
\hline
\multicolumn{13}{c}{Housing Price Dataset}\\
\hline
\hline
R.1  & 0.149 & 0.474 & 0.46 & 0.451 & 0.427 & 0.474  & 0.319 & 0.316 & 0.311 & 0.310 & 0.432 & 0.317\\
R.5  & 0.15 & 0.465 & 0.412 & 0.439 & 0.427 & 0.474  & 0.293 & 0.294 & 0.284 & 0.290 & 0.432 & 0.317\\
R1   & 0.149 & 0.462 & 0.398 & 0.443 & 0.427 & 0.474  & 0.343 & 0.345 & 0.307 & 0.337 & 0.432 & 0.317\\
\hline
\multicolumn{13}{c}{Energy Appliances Dataset}\\
\hline
R.1  & 0.004 & 0.029 & 0.183 & 0.198 & 0.16 & 0.192 & 0.502 & 0.518 & 0.472 & 0.473 & 0.47 & 0.469\\  
R.5  & 0.004 & 0.000   & 0.201 & 0.194 & 0.16 & 0.192 & 0.502 & 0.595 & 0.474 & 0.473 & 0.47 & 0.469\\
R1   & 0.000   & 0.000   & 0.262 & 0.184 & 0.16 & 0.192 & 0.503 & 0.596 & 0.478 & 0.475 & 0.47 & 0.469\\
\hline
\hline
\end{tabular}
}
\end{table*}


%%%%% KS test

\begin{table}[htp]
\centering
\caption{Kolmogorov-Smirnov (KS) test results comparing proposed Sign Entropy regularization with other methods. For each test, the null hypothesis $H_{0}$ was "The distribution of the RMSE score of our proposed regularization is the same as the compared method," and the alternative hypothesis $H_{a}$ was "The distributions are different." KS Statistic refers to the maximum distance between cumulative distributions, and p-value indicates the probability of observing the result under $H_{0}$. Results are grouped by dataset: Energy (E) and Housing (H). All p-values are larger than the commonly accepted threshold of 0.05 except for the proposed method vs. ARD for Housing Dataset (highlighted in red). However, as seen from the RMSE density plots in \cref{fig:asfe_rmse_regularization} and \cref{tab:median_coss_rmse_partial}, the RMSE of ARD in this case is much higher than other methods. Thus, we conclude that there is no statistically significant \textbf{increase} in the RMSE score due to our proposed Sign Entropy regularization.}
\label{tab:ks_test}
\resizebox{0.5\textwidth}{!}{
\begin{tabular}{lll}
\hline
\hline
\textbf{Test} & \textbf{KS Statistic} & \textbf{p-value} \\
\hline
\multicolumn{3}{c}{Energy Dataset (E)} \\
\hline
Proposed vs OLS & 0.200 & 0.731 \\ 
Proposed vs ARD & 0.200 & 0.731 \\ 
Proposed vs Bayesian Ridge & 0.133 & 0.825 \\ 
Proposed vs Lasso & 0.200 & 0.332 \\ 
Proposed vs Ridge & 0.111 & 0.948 \\ 
\hline
\multicolumn{3}{c}{Housing Dataset (H)} \\
\hline
Proposed vs OLS & 0.110 & 0.958 \\ 
Proposed vs ARD & 0.550 & \textcolor{red}{0.000} \\ 
Proposed vs Bayesian Ridge & 0.050 & 0.999 \\ 
Proposed vs Lasso & 0.040 & 1.000 \\ 
Proposed vs Ridge & 0.050 & 0.999 \\ 
\hline
\hline
\end{tabular}
}
\end{table}

\clearpage
\newpage

