\section{Discussion and Conclusions}\label{sec:discussion}

We have introduced an evidential formulation of DualU-Net that provides, in a single forward pass, two complementary uncertainty families: segmentation-driven evidential uncertainty (aleatoric, epistemic, vacuity) targeting classification errors, and centroid-derived geometric uncertainty (peak and mass) targeting detection and localisation errors. Together, they offer a unified decomposition of instance-level reliability that aligns with the two dominant failure modes in cell instance segmentation.

As shown in Table~\ref{tab:performance}, incorporating EDL into the DualU-Net architecture does not degrade predictive performance: our evidential variants achieve comparable results to all considered baselines, with minor improvements over the deterministic Base model, but no statistically significant differences with respect to Base, DE, or MCD. Across PanNuke and Ki-67, the evidential scheme consistently outperforms the deterministic baseline and matches or surpasses DE and MCD in error separation, while being substantially more efficient. Although the three segmentation-head uncertainties differ qualitatively—aleatoric tending to produce higher intensities, epistemic and vacuity spanning wider dynamic ranges (Figure~\ref{fig:segmentation_results} and Appendix~\ref{ap:plots_seg})—their quantitative behaviour is statistically indistinguishable in terms of error discrimination (Table~\ref{tab:segmentation_and_centroid_uncertainty}). In PanNuke, this alignment is visually evident (Figure~\ref{fig:qualitative_results}): all three uncertainties assign high values to the same problematic nuclei, highlighting classification mistakes, false positives, or low-confidence predictions that merit inspection.

Regarding calibration, the evidential formulation improves mean calibration relative to DE and MCD, bringing predicted confidence closer to empirical correctness (Table~\ref{tab:segmentation_and_centroid_uncertainty}). The strong calibration of DE and MCD is expected, as both average over multiple stochastic predictors, which inherently smooths confidence estimates. Despite relying on a single forward pass, our method achieves comparable or better calibration while maintaining similar uncertainty–error alignment at a much lower computational cost. Under-confidence at low predicted probabilities is an intrinsic effect of evidential modeling: when evidence is limited, vacuity dominates and the Dirichlet mean is drawn toward the uniform prior, reducing confidence even for correct predictions (Figure~\ref{fig:segmentation_calibration}).


The class-weighted variant (\emph{Ours w}) exposes a clear trade-off between rare-class performance and calibration. By amplifying gradients for underrepresented classes, class weighting promotes stronger evidence accumulation and improves fold-wise F1 scores, particularly for the Necrotic class in PanNuke (Table~\ref{tab:performance}). At the same time, this reduces the regularising effect of the evidential KL term in low-sample regimes, allowing the model to become overly confident when evidence is scarce. Consequently, \emph{Ours w} shows increased calibration error (e.g., higher MCE and M-UCE), reflecting a tension between enhancing rare-class sensitivity and maintaining conservative uncertainty estimates.

For the centroid head, the proposed Gaussian-based uncertainty measures are simple, interpretable, and computationally free at inference. Mass-based uncertainty is consistently the strongest cue, while peak uncertainty provides complementary information; their combination yields the best KS and AUROC values (Table~\ref{tab:segmentation_and_centroid_uncertainty}). Qualitative examples confirm that high-uncertainty instances correspond to misdetections, poor localisations, or ambiguous annotations, demonstrating the practical interpretability of these geometric cues (Figure~\ref{fig:qualitative_results}). The proposed centroid uncertainty cues rely on a Gaussian template with fixed standard deviation $\sigma$, which encodes an implicit prior on nucleus scale inherited from the original DualU-Net formulation. While the same $\sigma$ generalizes across PanNuke (H\&E on 19 different tissue types) and Ki-67 (a different staining modality) datasets without retuning, applying the method to datasets with substantially different microns-per-pixel resolution or nucleus size distributions would require re-tuning this hyperparameter.

Importantly, all hyperparameters optimised on PanNuke transfer directly to Ki-67 without re-tuning, highlighting the cross-dataset generalisation of the evidential framework and its robustness under domain shift. The ability to surface uncertainty at inference time enables model introspection for pathologists and supports downstream applications such as active learning, quality control of annotations, and uncertainty-aware dataset curation.

Finally, we acknowledge recent work highlighting limitations of standard evidential formulations, including sensitivity to prior design choices and optimisation objectives that may induce over-confidence under certain conditions \cite{chen2024redl, chen2025reedl}. While our results demonstrate that a streamlined evidential formulation is already effective and competitive in a challenging multi-task instance segmentation setting, exploring such refinements within DualU-Net constitutes a natural avenue for future work.

To our knowledge, this is the first evidential instance segmentation model in a multi-task setting for digital pathology, demonstrating both methodological and practical value. Future work includes extending evidential modelling to centroid regression via Normal--Inverse--Gamma uncertainty \cite{edl_reg}, enabling a fully evidential DualU-Net architecture.

