\section{Materials and Methods}\label{sec:methods}

\paragraph{Datasets.}
We evaluate on two annotated histopathology datasets.  
PanNuke~\cite{gamper2020pannuke} provides 7904 H\&E patches ($256{\times}256$) across 19 tissues with 189k nuclei labeled into five classes. We also use a proprietary breast Ki-67 IHC dataset~\cite{Anglada-Rotger_2024_CVPR} with 52 tiles ($1024{\times}1024$) from four patients, each containing pixel-level nuclei masks and three-class labels (positive, negative, non-epithelial). Both datasets are extracted at $40\times$ magnification with a spatial resolution of approximately $0.25\,\mu$m/pixel.





\paragraph{Evidential segmentation head and loss.}
DualU-Net~\cite{anglada2025dualunet} contains two decoders: a semantic segmentation head and a centroid-regression head.  
We keep this architecture but replace the segmentation logits with Dirichlet evidence.  
For each pixel $x$, the segmentation decoder outputs non-negative evidence values $e_k(x)\!\ge\!0$, which define the Dirichlet concentration parameters $\alpha_k(x) = e_k(x) + 1$,  $\boldsymbol{\alpha}(x) = (\alpha_1(x), \dots, \alpha_K(x))$, the predictive class probabilities are given by the Dirichlet mean $\hat{p}_k(x) = \frac{\alpha_k(x)}{S(x)}, S(x) = \sum_{j=1}^K \alpha_j(x).$ The predictive categorical distribution at pixel $x$ is defined as $\hat{\mathbf{p}}(x)$ $= (\hat{p}_1(x), \dots,$ $ \hat{p}_K(x))$. Following ~\cite{sensoy2018evidential}, the evidential loss combines a data-fitting term encouraging $\hat{\mathbf{p}}(x)$ to match the one-hot label $\mathbf{y}(x)$ with a KL regularizer that discourages unwarranted evidence.  To penalize evidence for incorrect classes while leaving the correct class unpenalized, we construct the modified Dirichlet parameter vector
$\tilde{\boldsymbol{\alpha}}(x)
=
\big(\tilde{\alpha}_1(x),\ldots,\tilde{\alpha}_K(x)\big),
$
where each component is defined as
\begin{equation}
    \tilde{\alpha}_k(x) =
    \begin{cases}
        1, & \text{if } k = y(x),\\[4pt]
        \alpha_k(x), & \text{otherwise}.
    \end{cases}
    \label{eq:alpha-tilde}
\end{equation}
This way, the per-pixel segmentation loss is
\begin{equation}
    \mathcal{L}_{\mathrm{EDL}}^{\mathrm{seg}}(x)
    = \|\mathbf{y}(x)-\hat{\mathbf{p}}(x)\|_2^2
    + \lambda_{KL}\,\mathrm{KL}\!\Big(
        D(\mathbf{p}\mid\tilde{\boldsymbol{\alpha}}(x))
        \;\big\|\;
        D(\mathbf{p}\mid\mathbf{1})
      \Big),
    \label{eq:edl-loss}
\end{equation}


As shown in~\cite{tan2025uncert}, incorporating a Dice term improves the optimization dynamics of evidential semantic segmentation.  
For this reason, all our experiments include an additional Dice component. In the original DualU-Net~\cite{anglada2025dualunet}, the Dice was class-weighted to mitigate strong label imbalance; however, such weighting is uncommon in EDL frameworks. We therefore evaluate two variants: (i) standard (unweighted) Dice and (ii) class-weighted Dice. The centroid decoder and its regression objective remain unchanged from the original DualU-Net. The full training objective is
\begin{equation}
    \mathcal{L}
    = \lambda_{seg}\mathcal{L}^{\mathrm{seg}}_{\mathrm{EDL}}
    + \lambda_{dice}\,\mathcal{L}_{\mathrm{Dice}}
    + \lambda_{cent}\,\mathcal{L}_{\mathrm{cent}},
    \label{eq:total-loss}
\end{equation}

\paragraph{Segmentation-head evidential uncertainty.}
Let $\mathcal{D}$ be the training dataset and $\hat{y}$ the categorical prediction at pixel $x$, modeled as a random variable 
$\hat{y}\sim\mathrm{Cat}(\mathbf{p})$ where $\mathbf{p}$ is drawn from the Dirichlet distribution $D(\mathbf{p}\mid\boldsymbol{\alpha}(x))$.
For a Bayesian classifier with Dirichlet-distributed class probabilities $\mathbf{p}\sim D(\mathbf{p}\mid\boldsymbol{\alpha}(x))$, as in \cite{gal2017uncertainties,tan2024edlbiomed}, we use $u_{\mathrm{ale}}(x)
    = \mathbb{E}_{\text{Dir}}\!\left[
        \mathrm{Var}_{\text{Cat}}\big(\hat{y}\mid \mathbf{p}\big)
      \right],$ 
    $u_{\mathrm{epi}}(x)
    = \mathrm{Var}_{\text{Dir}}\!\left(
        \mathbb{E}_{\text{Cat}}\big[\hat{y}\mid \mathbf{p}\big]
      \right)$

For the Dirichlet prior, it admits the following closed forms (see Appendix \ref{ap:formulas}):
\begin{equation}
    u_{\mathrm{ale}}(x)
    = \sum_{k=1}^K
      \frac{\alpha_k(x)\big(S(x)-\alpha_k(x)\big)}
           {S(x)\big(S(x)+1\big)},
    \qquad
    u_{\mathrm{epi}}(x)
    = \sum_{k=1}^K
      \frac{\alpha_k(x)\big(S(x)-\alpha_k(x)\big)}
           {S^2(x)\big(S(x)+1\big)}.
    \label{eq:epi_ale}
\end{equation}

A third quantity naturally arises in evidential models: vacuity. While aleatoric and epistemic uncertainties separate noise from model uncertainty, vacuity measures the absence of evidence accumulated from the data $u_{\mathrm{vac}}(x) = \frac{K}{S(x)}$.

Cell analysis requires uncertainty not only at the pixel level but also at the instance level, since downstream evaluation (detection F1, classification F1) and clinical interpretation are performed per nucleus rather than per pixel. Instance masks $\Omega_i$ are obtained with the same watershed reconstruction as in DualU-Net (see \ref{sec:sota}). In evidential classification, Dirichlet parameters are commonly interpreted as accumulated evidence arising from independent observations, in which case evidence is additive in the underlying Gamma space. Under this interpretation, each pixel prediction can be viewed as providing a local Dirichlet evidence vector over classes. If pixel-level evidences were conditionally independent samples of the same latent instance-level variable, a principled Bayesian aggregation would correspond to summing Dirichlet parameters across pixels. However, pixel-level predictions within a nucleus are not independent: they are spatially correlated, share receptive fields, and are influenced by common morphological context. Moreover, nucleus size varies substantially, so summing evidences would cause the total concentration $S$ to scale with instance area, artificially suppressing epistemic uncertainty and vacuity for larger nuclei. Therefore, for each instance, we therefore aggregate evidential parameters by averaging:
\begin{equation}
\bar{\alpha}_k^{(i)}
=
\frac{1}{|\Omega_i|}
\sum{x\in\Omega_i}\alpha_k(x),
\qquad
\bar{S}^{(i)}=\sum_{k=1}^K\bar{\alpha}_k^{(i)}.
\label{eq:inst-alpha}
\end{equation}

This operation should be understood as a pooling of correlated pixel-level evidence rather than as a Bayesian evidence fusion rule. Averaging preserves the relative evidence proportions learned by the network while enforcing size invariance across instances, yielding a stable instance-level evidence profile from which uncertainty quantities can be consistently derived. An ablation study comparing mean, sum, and median pooling for instance-level aggregation is presented in Appendix~\ref{ap:agg}.

At the pixel level, all Dirichlet parameters—including the background class—contribute to uncertainty because they shape the full predictive distribution.
However, for instance-level uncertainty we are interested only in the reliability of the classification of a segmented nucleus. Therefore, when computing instance-level uncertainty, we exclude the background component from $\bar{\boldsymbol{\alpha}}^{(i)}$ and renormalize over the $K{-}1$ foreground classes.
This ensures that $u_{\mathrm{ale}}(\Omega_i)$, $u_{\mathrm{epi}}(\Omega_i)$, and $u_{\mathrm{vac}}(\Omega_i)$ quantify uncertainty about the nucleus class, not about residual background evidence. Substituting the resulting foreground-only $\bar{\boldsymbol{\alpha}}^{(i)}$ into
$u_{epi}$, $u_{ale}$, and $u_{vac}$ yields instance-level
$u_{\mathrm{ale}}(\Omega_i)$,
$u_{\mathrm{epi}}(\Omega_i)$,
and
$u_{\mathrm{vac}}(\Omega_i)$. To make all uncertainty quantities directly comparable and easily interpretable, we normalize 
$u_{\mathrm{ale}}$, $u_{\mathrm{epi}}$, and $u_{\mathrm{vac}}$ to the range $[0,1]$. Each expression admits a closed-form theoretical minimum and maximum determined by the Dirichlet parameters $\boldsymbol{\alpha}$ (see Appendix \ref{ap:limits}). For each uncertainty type, we compute its attainable bounds and apply an affine normalization. 

\paragraph{Centroid-head uncertainty.}
While Kendal and Gal ~\cite{gal2017uncertainties} provide a standard probabilistic framework for regression by minimizing the Gaussian Negative Log Likelihood (NLL), we explicitly opt for a geometric approach. The centroid regression head itself follows the original DualU-Net formulation, without any architectural modification. In sparse centroid regression, the NLL objective is not only prone to optimization instability due to class imbalance, but it also strictly models pixel-intensity noise. In contrast, our proposed geometric reliability measures target structural failures.


Let $g:\mathcal{X}\to[0,\infty)$ denote the Gaussian density map predicted by the centroid decoder, where $g(x)$ is the value at pixel $x$.  
For each reconstructed nucleus instance $\Omega_{i}\subset\mathcal{X}$, assumed to arise from an isotropic Gaussian with standard deviation $\sigma$, the ideal density integrates to the analytic mass $G_{\max}=2\pi\sigma^{2}$.  
Departures of $g$ from this template reflect unreliable centroid localisation.  
We extract two complementary geometric cues:  
(i) \emph{Peak uncertainty}, which assesses the sharpness of the predicted Gaussian by the maximum value 
$p_{\max}^{(i)}=\max_{x\in\Omega_{i}} g(x)$; diffuse or weak responses indicate uncertain detections.  
We define
\begin{equation}
    u_{\mathrm{peak}}(\Omega_{i}) = 1 - p_{\max}^{(i)}.
\end{equation}
(ii) \emph{Mass-ratio uncertainty}, which measures energy preservation.  
Let $m_{\mathrm{pred}}^{(i)}=\sum_{x\in\Omega_{i}} g(x)$ denote the predicted mass; deviations from $G_{\max}$ are quantified symmetrically as
\begin{equation}
    u_{\mathrm{mass}}(\Omega_{i})
    = \frac{\big|m_{\mathrm{pred}}^{(i)} - G_{\max}\big|}{G_{\max}}.
    \label{eq:umass_align}
\end{equation}
Values near zero correspond to correct centroid strength, whereas large deviations signal missing, diffuse, or overly dominant Gaussian responses. These two cues provide simple and direct measures of centroid reliability for each nucleus.  
A single scalar uncertainty value is obtained via a linear combination
$
u_{\mathrm{cent}}(\Omega_{i})
= \lambda_{\mathrm{peak}}\,u_{\mathrm{peak}}(\Omega_{i})
+ \lambda_{\mathrm{mass}}\,u_{\mathrm{mass}}(\Omega_{i}),
$.



\paragraph{Two uncertainties for two error types.}
For each nucleus $\Omega_i$, our method outputs two complementary uncertainty families.  
Segmentation-head evidential uncertainties  
$\big(u_{\mathrm{epi}}(\Omega_i),$ $\,u_{\mathrm{ale}}(\Omega_i),\,u_{\mathrm{vac}}(\Omega_i)\big)$  
reflect ambiguity in the class distribution and are therefore linked to \emph{classification} errors.  
Centroid-based geometric scores $\big(u_{\mathrm{cent}}(\Omega_i),\,u_{\mathrm{peak}}(\Omega_i),\,u_{\mathrm{mass}}(\Omega_i)\big)$ capture the sharpness and stability of the predicted Gaussian response, making them indicative of \textit{detection} errors. Together, they offer complementary, instance-level reliability signals.


