\section{Ablation Study on Instance-level Dirichlet Aggregation}
\label{ap:agg}

\begin{table}[t]
\centering
\caption{Instance-level uncertainty evaluation under different aggregation strategies (mean, median, and sum) for Dirichlet parameters.
Results are reported for the two evidential variants only (\textit{Ours} and \textit{Ours w}). Metrics and evaluation protocol follow the main paper.}
\label{tab:agg_ablation}

\begin{minipage}[t]{0.8\textwidth}
\centering

\rowcolors{2}{gray!15}{white}
\resizebox{\linewidth}{!}{%
\begin{tabular}{lllcccccc}
\hline
\rowcolor{gray!30}
\textbf{M} & \textbf{Agg.} & \textbf{UM} &
\textbf{ACE $\downarrow$} & \textbf{MCE $\downarrow$} &
\textbf{A-UCE $\downarrow$} & \textbf{M-UCE $\downarrow$} &
\textbf{KS $\uparrow$} & \textbf{AUROC $\uparrow$} \\
\hline
\rowcolor{gray!30}
\multicolumn{9}{c}{\textbf{PanNuke}}\\
\hline
\textit{Ours} & Mean & $u_{\text{ale}}$
    & \textbf{0.061}$_{\pm0.004}$
    & \textbf{0.289}$_{\pm0.010}$
    & 0.157$_{\pm0.010}$
    & 0.326$_{\pm0.025}$
    & 0.392$_{\pm0.003}$
    & 0.759$_{\pm0.003}$ \\
& Mean & $u_{\text{epi}}$
    & \textbf{0.061}$_{\pm0.004}$
    & \textbf{0.289}$_{\pm0.010}$
    & 0.100$_{\pm0.004}$
    & 0.251$_{\pm0.017}$
    & 0.392$_{\pm0.003}$
    & 0.759$_{\pm0.003}$ \\
& Mean & $u_{\text{vac}}$
    & \textbf{0.061}$_{\pm0.004}$
    & \textbf{0.289}$_{\pm0.010}$
    & \textbf{0.054}$_{\pm0.004}$
    & \textbf{0.246}$_{\pm0.016}$
    & 0.391$_{\pm0.003}$
    & 0.758$_{\pm0.003}$ \\
\hline
\textit{Ours} & Sum & $u_{\text{ale}}$
    & 0.061$_{\pm0.004}$
    & 0.289$_{\pm0.010}$
    & 0.185$_{\pm0.012}$
    & 0.428$_{\pm0.019}$
    & 0.392$_{\pm0.003}$
    & 0.759$_{\pm0.003}$ \\
& Sum & $u_{\text{epi}}$
    & 0.061$_{\pm0.004}$
    & 0.289$_{\pm0.010}$
    & 0.216$_{\pm0.004}$
    & 0.468$_{\pm0.008}$
    & 0.371$_{\pm0.005}$
    & 0.743$_{\pm0.003}$ \\
& Sum & $u_{\text{vac}}$
    & 0.061$_{\pm0.004}$
    & 0.289$_{\pm0.010}$
    & 0.216$_{\pm0.004}$
    & 0.460$_{\pm0.007}$
    & 0.350$_{\pm0.005}$
    & 0.729$_{\pm0.003}$ \\
\hline
\textit{Ours} & Median & $u_{\text{ale}}$
    & 0.066$_{\pm0.003}$
    & 0.312$_{\pm0.008}$
    & 0.167$_{\pm0.004}$
    & 0.360$_{\pm0.007}$
    & 0.376$_{\pm0.002}$
    & 0.752$_{\pm0.003}$ \\
& Median & $u_{\text{epi}}$
    & 0.066$_{\pm0.003}$
    & 0.312$_{\pm0.008}$
    & 0.103$_{\pm0.004}$
    & 0.283$_{\pm0.004}$
    & 0.376$_{\pm0.002}$
    & 0.753$_{\pm0.003}$ \\
& Median & $u_{\text{vac}}$
    & 0.066$_{\pm0.003}$
    & 0.312$_{\pm0.008}$
    & 0.060$_{\pm0.002}$
    & 0.277$_{\pm0.002}$
    & 0.376$_{\pm0.002}$
    & 0.753$_{\pm0.003}$ \\
\hline
\textit{Ours w} & Mean & $u_{\text{ale}}$
    & 0.095$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.175$_{\pm0.005}$
    & 0.382$_{\pm0.008}$
    & 0.442$_{\pm0.005}$
    & 0.791$_{\pm0.002}$ \\
& Mean & $u_{\text{epi}}$
    & 0.095$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.113$_{\pm0.002}$
    & 0.333$_{\pm0.002}$
    & \textbf{0.442}$_{\pm0.005}$
    & \textbf{0.796}$_{\pm0.003}$ \\
& Mean & $u_{\text{vac}}$
    & 0.095$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.080$_{\pm0.003}$
    & 0.321$_{\pm0.003}$
    & 0.441$_{\pm0.005}$
    & 0.796$_{\pm0.003}$ \\
\hline
\textit{Ours w} & Sum & $u_{\text{ale}}$
    & 0.094$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.217$_{\pm0.005}$
    & 0.499$_{\pm0.014}$
    & 0.442$_{\pm0.005}$
    & 0.795$_{\pm0.003}$ \\
& Sum & $u_{\text{epi}}$
    & 0.094$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.245$_{\pm0.006}$
    & 0.545$_{\pm0.008}$
    & 0.429$_{\pm0.004}$
    & 0.775$_{\pm0.001}$ \\
& Sum & $u_{\text{vac}}$
    & 0.094$_{\pm0.003}$
    & 0.383$_{\pm0.005}$
    & 0.245$_{\pm0.006}$
    & 0.528$_{\pm0.010}$
    & 0.413$_{\pm0.002}$
    & 0.764$_{\pm0.001}$ \\
\hline
\textit{Ours w} & Median & $u_{\text{ale}}$
    & 0.109$_{\pm0.004}$
    & 0.418$_{\pm0.009}$
    & 0.184$_{\pm0.004}$
    & 0.416$_{\pm0.017}$
    & 0.431$_{\pm0.004}$
    & 0.783$_{\pm0.002}$ \\
& Median & $u_{\text{epi}}$
    & 0.109$_{\pm0.004}$
    & 0.418$_{\pm0.009}$
    & 0.126$_{\pm0.002}$
    & 0.370$_{\pm0.005}$
    & 0.431$_{\pm0.004}$
    & 0.791$_{\pm0.002}$ \\
& Median & $u_{\text{vac}}$
    & 0.109$_{\pm0.004}$
    & 0.418$_{\pm0.009}$
    & 0.096$_{\pm0.005}$
    & 0.359$_{\pm0.006}$
    & 0.431$_{\pm0.004}$
    & 0.791$_{\pm0.002}$ \\

\hline

\end{tabular}}
\end{minipage}
\end{table}



\begin{figure}[t]
\centering

% --- Top row: Histograms ---
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_pannuke/hist_vacuity.png}


% --- Bottom row: ECDFs ---
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_pannuke/hist_vacuity.png}

% --- Bottom row: ECDFs ---
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_pannuke/hist_vacuity.png}

\caption{Instance-level histograms for segmentation uncertainties ($u_{\mathrm{ale}}$, $u_{\mathrm{epi}}$, $u_{\mathrm{vac}}$) on PanNuke, \textit{Ours}. Rows, top to bottom, \textit{mean}, \textit{sum}, \textit{median}.}
\label{fig:agg_hist}
\end{figure}

\begin{figure}[t]
\centering

% --- Top row: Histograms ---
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_w_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_w_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/seg_ins_w_pannuke/hist_vacuity.png}


% --- Bottom row: ECDFs ---
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_w_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_w_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_sum_w_pannuke/hist_vacuity.png}

% --- Bottom row: ECDFs ---
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_w_pannuke/hist_edl_ale.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_w_pannuke/hist_edl_epi.png}\hfill
\includegraphics[width=0.30\textwidth]{figures/plots/ab/seg_ins_median_w_pannuke/hist_vacuity.png}

\caption{Instance-level histograms for segmentation uncertainties ($u_{\mathrm{ale}}$, $u_{\mathrm{epi}}$, $u_{\mathrm{vac}}$) on PanNuke, \textit{Ours w}. Rows, top to bottom, \textit{mean}, \textit{sum}, \textit{median}.}
\label{fig:agg_hist_w}
\end{figure}

This appendix analyzes the sensitivity of instance-level evidential uncertainty to the choice of pixel-wise pooling operation. We compare three aggregation strategies—mean, sum, and median—applied to pixel-level Dirichlet parameters within each watershed-derived cell instance. The ablation is performed for both \textit{Ours} and \textit{Ours w} variants using three cross-validation folds on PanNuke, while keeping the network, training protocol, and evaluation metrics unchanged. The goal is to assess whether the choice of pooling operation materially affects calibration, error–uncertainty separation, and interpretability of instance-level uncertainties.

Quantitative results are summarized in Table~\ref{tab:agg_ablation}. Mean aggregation consistently yields the best performance across calibration (ECE, ACE, UCE), ranking-based metrics (AUROC), and distributional tests (KS). Sum aggregation leads to systematically degraded calibration, particularly for epistemic uncertainty and vacuity, while median aggregation performs closer to mean but with slightly weaker error–uncertainty separation. Paired $t$-tests across folds confirm that mean aggregation significantly outperforms sum aggregation for calibration-related metrics ($p < 0.01$ for ECE, UCE, and Adj-UCE) and significantly outperforms median aggregation for AUROC. No metric shows a statistically significant advantage for sum or median over mean aggregation.

The qualitative behavior underlying these trends is illustrated in Figures~\ref{fig:agg_hist} and~\ref{fig:agg_hist_w}, which report instance-level histograms of epistemic uncertainty and vacuity. For sum aggregation (right panels), both quantities collapse toward zero for nearly all instances. This effect is caused by the growth of the total Dirichlet concentration $S$ with instance size, which suppresses epistemic uncertainty and vacuity regardless of prediction correctness. Although some error separation may remain, the resulting uncertainty values are poorly calibrated and difficult to interpret. In contrast, median aggregation does not exhibit this collapse; however, its histograms are visually almost indistinguishable from those obtained with mean aggregation, for both \textit{Ours} and \textit{Ours w}, indicating no clear qualitative advantage over mean pooling.

Based on both quantitative and qualitative evidence, mean aggregation provides the most reliable instance-level uncertainty representation. It avoids the degenerate behavior induced by summation, preserves interpretability of epistemic uncertainty and vacuity, and achieves consistently better calibration and error–uncertainty separation than median pooling. These results justify the use of mean aggregation as a stable and size-invariant pooling operation for instance-level evidential uncertainty in the main paper.

