\section{Methods}

\subsection{Attention-Based Multiple Instance Learning}

We adopt the attention-based MIL formulation of \cite{ilse2018attentionbaseddeepmultipleinstance}. A bag
$X = \{x_i\}_{i=1}^{N}$ consists of $N$ instances, each mapped to a feature embedding by a shared frozen encoder:
\begin{equation}
\label{eq:inst_embed_sup}
\mathbf{z}_i = f(x_i), 
\qquad i = 1,\dots,N.
\end{equation}

\noindent The instance embeddings are aggregated into a bag-level representation using a learnable attention-weighted pooling, 
$\mathbf{Z} = \sum_{i=1}^{N} a_i \mathbf{z}_i$
which is passed to a classifier to obtain the final bag prediction
$Y = \varphi(\mathbf{Z}) .
$

\subsection{Counterfactual Attention Intervention}

To explicitly model the causal contribution of attention to prediction, we model an attention-based MIL framework through a structural causal model (SCM) graph \cite{ pearl} as shown in Fig.\ref{fig:model_and_scg}, where $X$: WSI (bag), $Y$: bag label, $A$: attention distribution. $X \rightarrow A$ indicates that attention is generated from $X$, $A \rightarrow Y$ indicates that attention leads to a bag level prediction, and $X \rightarrow Y$ indicates that bag instances lead to a bag level prediction.


\noindent Let $X=\{z_i\}_{i=1}^{N}$ denote the instance embeddings and $A=\{a_i\}_{i=1}^{N}$ the learned attention distribution. The standard prediction is:
\begin{equation}
Y(A, X) = \varphi\!\left(\sum_{i=1}^{N} a_i z_i \right) = \varphi(\mathbf{Z})
\label{eq:inference_eq}
\end{equation}

\noindent We introduce a counterfactual intervention in attention by cutting the path from $X$ to $A$ and measuring the effect of this intervention on the prediction as shown in Fig.\ref{fig:model_and_scg}(b).

\begin{equation}
Y(\text{do}(A=\bar{A}), X) = 
\varphi\!\left(\sum_{i=1}^{N} \bar{a}_i z_i \right) = \bar{Y}
\end{equation}
where the do-operation do(·)  forcibly assigns a specific non-informative value to $A$, referred to as $\bar{A}$, while keeping $X$ fixed. $\bar{A} \sim \gamma$ may be sampled as random or uniform attention. This way, the attention effect is defined as:

\begin{equation}
Y_{\text{effect}} = 
\mathbb{E}_{\bar{A} \sim \gamma}
\big[
Y(A, X) - Y(\text{do}(A=\bar{A}), X)
\big]
\end{equation}
In practice, according to the results in \cite{baldi2014dropout}, the expectation is approximated by sampling a single counterfactual attention per bag, resulting in negligible training overhead and no additional inference cost. To guide the learning of causally effective attention, we add a cross-entropy loss between the effect caused by the counterfactual intervention and the ground-truth bag label $y$.  The final training objective is:

\begin{equation}
\mathcal{L}
=
\mathcal{L}_{\mathrm{CE}}(Y(A,X), y)
+
\lambda \, \mathcal{L}_{\mathrm{CE}}(Y_{\text{effect}}, y)
\end{equation}
where $\lambda$ is a hyperparameter, and it controls the influence of the counterfactual supervision.
The additional causal intervention loss term to the standard classification cross-entropy loss guides the model to not only learn the classification task, but also to ensure attention patterns that are effect causing for the model: a well-learned attention $A$ should lead to a correct prediction, while a random non-informative attention $\bar{A}$ should fail in leading to the same level of prediction accuracy. 

\begin{figure}[t!]
\floatconts
  {fig:model_and_scg}
  {\caption{\textbf{Overview of \ours and its causal intervention module.} (a) \ours: Proposed model architecture. (b) Structural causal graph of the counterfactual attention intervention.}}
  {\includegraphics[width=\linewidth]{figs/model_310.png}}
  \vspace{-2.0em}
\end{figure}

\subsection{Attention-Based Perturbation Analysis}
We assess the faithfulness of MIL attention as an interpretability proxy using a region perturbation strategy \cite{hense2025xmilinsightfulexplanationsmultiple, alber2019innvestigate}. Patches are ranked by decreasing attention scores and partitioned into $100$ disjoint subsets $\{r_1,\dots,r_{100}\}$ of equal size. The perturbed slide at step  $k<100$ is defined as:
\begin{equation}
\label{eq:perturb_system}
% \left\{
\begin{aligned}
X^{(k)} &= \bigcup_{i=k+1}^{100} r_i,
\qquad k = 0,\dots,99, \\[4pt]
% X^{(100)} &= \mathbf{0}.
\end{aligned}
% \right.
\end{equation}

\noindent The model's prediction is evaluated at each perturbation step to obtain a perturbation curve. For every slide $X$, we start first with prediction from the original bag with all patches $X^{(0)} = X$, and start removing subsequently at each following step $k$, subset $r_{k}$, to get the model's prediction $s(k)$ for the perturbed slide $X^{(k)}$. To evaluate the attention influence on prediction, we calculate the Area Under the Perturbation Curve (AUPC). A lower AUPC indicates a faster degradation of predictive confidence when highly attended regions are removed, reflecting an attention mechanism valid for explaining the model's decision. Additional details are provided in Appendix C.

