\vspace {-.2cm}
\section{Experimental Evaluations}
\label{exp}
\noindent\textbf{Datasets and Ground Truth.}
This retrospective study analyzes 30 contrast-enhanced CT scans from the PANORAMA Challenge~\cite{panorama2024}, comprising portal venous phase scans from five European centers. We further validate our approach on 30 CT scans from the Memorial Sloan Kettering (MSK) Medical Segmentation Decathlon dataset~\cite{simpson2019large}, with detailed results in Appendix~\ref{additional_results}. 
Ground truth segmentations for PDAC tumors were provided by expert radiologists, while anatomical structures (pancreas, duodenum, liver, gallbladder, kidneys, adrenal glands, spleen) were segmented using TotalSegmentator~\cite{wasserthal2023totalsegmentator}. A hierarchical fusion strategy prioritizes radiologist annotations over automated segmentations.

\vspace{1mm}\noindent\textbf{AI-Generated Segmentations.}
We employ two deep learning models: a primary model for PDAC tumors and surrounding anatomical structures \cite{bereska2024artificial}, and a vessel-specific model focused on structures critical for PDAC resectability assessment. The latter segments five key vessels: the celiac trunk (CeTr), hepatic artery (HA), portal vein (PV), superior mesenteric artery (SMA), and superior mesenteric vein (SMV). The PDAC segmentation employs a tripartite architecture of teacher, professor, and student models, all implemented using 3D UNet cascades. The final student model was trained on a dataset of 1085 CTs from 903 patients
(see Appendix \ref{app:ai_seg} for details). 
For subsequent non-conformity score computation, we preserve (1) the pre-softmax probability maps for all 11 classes (10 anatomical labels plus background) from the primary segmentation model, and (2) distance maps computed from the vessel-specific model, measuring the distance from each voxel to each of the five resectability-determining vessels. To ensure robustness to outliers, both distance and probability values are clipped at their respective 95th percentiles before being used in the non-conformity score computation. 
Our analysis shows these segmentation models exhibit spatially varying accuracy, with significantly lower Dice scores near vessels ($\leq$ 5mm: median 0.75, mean 0.64) compared to more distant regions (median 0.812, mean 0.75). This performance gap, coupled with higher variability near vessels (SD: 0.27 vs. 0.15), highlights a key challenge in medical image segmentation-critical regions often suffer from both lower model performance and greater inter-observer variability. Since ground truth itself is ambiguous in these areas, improving segmentation models alone may not suffice, underscoring the need for a more precise, region-sensitive uncertainty quantification.

\vspace{1mm}\noindent\textbf{Cropping.}
To optimize computational efficiency, we use an adaptive 3D bounding box cropping strategy. We identify the minimal volumetric boundary encompassing all voxels with specified target labels (gallbladder, pancreas, duodenum, and tumor) and apply this crop consistently across all corresponding image modalities and their derivatives.

\vspace {-.2cm}
\subsection{Evaluation Metrics}
We evaluate our spatially-adaptive framework through metrics assessing both predictive performance and anatomical sensitivity.
For each voxel $x\in \mathcal{X}$ with confidence level $1-\alpha$, we compute the empirical coverage rate for the set-valued predictor function $\mathcal{C}$ as $\text{cov}(\mathcal{C}) =  \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} \mathbbm{1}_{\{y \in \mathcal{C}(x)\}}$,
where $\mathbbm{1}$ is the indicator function and $y\in \mathcal{Y}$ is the true label. Coverage is assessed separately for vessel-adjacent ($\delta_v \leq 5\text{mm}$) and non-critical regions ($\delta_v > 5\text{mm}$). A higher coverage rate indicates increased conservativeness in the model's predictions, prompting clinicians to exercise additional caution and potentially seek further diagnostic imaging or expert review for regions where the prediction sets are larger. 

To evaluate calibration quality across different prediction set sizes, we employ Size-Stratified Coverage Violation (SSCV) analysis, which examines how empirical coverage varies in $K$ different ranges (bins) of prediction set cardinality as $\mathcal{S}_j\subset\{1,2,…,|\mathcal{Y}|\}$ where $\mathcal{Y}$ is the set of possible classes. $\text{SSCV}\in \mathbb{R}^{[0,1)}$ is computed as $\text{SSCV}(\mathcal{C},\{\mathcal{S}_j\}_{j=1}^K) = \sup_j |\frac{1}{|\mathcal{X}_j|} \sum_{x \in \mathcal{X}_j} \mathbbm{1}_{\{y \in \mathcal{C}(x)\}} - (1-\alpha)|$, 
where $\mathcal{X}_j = \{x\in \mathcal{X}: |\mathcal{C}(x)| \in \mathcal{S}_j\}$ represents the set of voxels having prediction sets in size range $\mathcal{S}_j$~\cite{angelopoulos2020uncertainty}. Lower SSCV values indicate better calibration across different set sizes.
We also define the Relative Width Ratio (RWR) to quantify adaptation of prediction set sizes based on anatomical criticality around an arbitrary distance threshold $r$ as,
\vspace {-.1cm}
\begin{equation}
    \rho(r) = \frac{\mu(\mathcal{C} | \delta_v > r)}{\mu(\mathcal{C} | \delta_v \leq r)}\ ,
\end{equation}
where $\mu(\mathcal{C}|\delta_v)=\frac{1}{|\mathcal{X}_v|} \sum_{x \in \mathcal{X}_v} \big|\mathcal{C}(x)\big|$ represents the average set size to evaluate prediction set efficiency for the set of voxels $\mathcal{X}_v$ at distance $\delta_v$ from the nearest vessel $v$.
\color{black}
\vspace {-.2cm}
\subsection{Experimental Setup}
We use 10 cases for calibration to determine class-specific non-conformity score thresholds $\tau^{\hat{y}}_{\alpha}$ for each label $\hat{y}\in \mathcal{Y}$ and evaluate on 20 held-out cases. Statistical comparisons use paired t-tests with Benjamini-Hochberg correction ($p < 0.05$).
For the vessel-specific analysis, we incorporate anatomical context through a weighted scoring mechanism. Critical vessels are assigned different relevancy hyperparameters ($\gamma$) based on the NCCN resectability criteria for PDAC, with arterial vessels (CeTr, HA, SMA) receiving higher weights ($\gamma = 0.8$) compared to venous vessels (PV, SMV: $\gamma = 0.6$). This weighting scheme reflects their relative importance in determining resectability, as arterial involvement beyond $180^{\circ}$ renders a tumor unresectable, while venous involvement may permit resection with reconstruction. 

To achieve sharp transitions in uncertainty estimates near vessel boundaries, we amplify the sigmoid response using a gain factor ($\beta = 10$), creating more pronounced changes in uncertainty estimates as predictions approach critical vascular structures. This enhanced sigmoid sensitivity provides a clearer delineation of high-risk regions for surgical planning. Computational requirements and performance metrics are detailed in Appendix~\ref{comp_req}.

\vspace {-.2cm}
\subsection{Experimental Results}
\noindent\textbf{Coverage Analysis.}
Our framework achieves strong coverage on the PANORAMA dataset ($n=20$) with an overall coverage of $0.987$ (mean per-case: $0.981 \pm 0.005$ SEM). The coverage significantly exceeds the target coverage of $0.95$ (Wilcoxon signed-rank test, $p=0.0007$). Size-stratified coverage violation (SSCV) analysis for the tumor label revealed consistent calibration across $89\%$ of voxels having prediction sets of size 0-3 elements with coverage violation of $0.037$ and coverage rates between 0.987 and 0.988. 

\vspace{1mm}\noindent\textbf{Distance-Based Analysis.}
As shown in Table~\ref{tab:coverage}, prediction set size decreases with distance from vessels while maintaining high coverage. RWR ranges from $2.762 \pm 0.150$ SEM near vessels ($\leq$2mm) to $2.525 \pm 0.036$ SEM beyond $20$mm, with coverage remaining consistently high across all distances ($0.981$-$0.988$). This decreasing RWR pattern suggests our method adapts to provide more precise predictions in regions farther from vessels, while maintaining wider prediction sets near critical vascular structures.
Vessel-specific analysis in Table~\ref{tab:vessel_coverage} demonstrates robust performance across all major vessels, with excellent coverage in vessel-proximate regions. Notably, we achieve high coverage in critical surgical planning zones, particularly near arteries. Visual examples of the prediction sets and their relationship to vessel proximity are provided in Appendix~\ref{visual_examples}. We conducted additional experiments with varying vessel relevancy hyperparameters ($\gamma$) to understand their impact on prediction set characteristics and coverage guarantees; detailed results are presented in Appendix~\ref{gamma_exp}.

\begin{table}[t]
\centering
\begin{minipage}{0.48\textwidth}
\caption{Coverage and RWR analysis across vessel proximity zones using Least Ambiguous Confidence score (LAC) at $\alpha=0.05$.}
\label{tab:coverage}
\tiny
\resizebox{\columnwidth}{!}{
\begin{tabular}{l|cc|cc}
\hline
\multirow{2}{*}{Distance} & \multicolumn{2}{c}{CCCP} & \multicolumn{2}{c}{SACP} \\
& Coverage & RWR & Coverage & RWR \\
\hline
$\leq2\text{mm}$   & 0.954 & 2.887 & 0.981 & 2.762 \\
$\leq5\text{mm}$   & 0.970 & 2.702 & 0.987 & 2.684 \\
$\leq10\text{mm}$  & 0.977 & 2.611 & 0.988 & 2.621 \\
$\leq20\text{mm}$  & 0.978 & 2.574 & 0.987 & 2.592 \\
$>20\text{mm}$    & 0.982 & 2.509 & 0.988 & 2.525 \\
\hline
\end{tabular}
}
\end{minipage}
\hfill
\begin{minipage}{0.51\textwidth}
\centering
\caption{Vessel-specific coverage rates at different proximity zones for CCCP (C) and SACP (S) at $\alpha=0.05$.}
\label{tab:vessel_coverage}
\resizebox{1.02\columnwidth}{!}{
\begin{tabular}{l|cc|cc|cc|cc|cc}
\hline
\multirow{2}{*}{Vessel} & \multicolumn{2}{c|}{2mm} & \multicolumn{2}{c|}{5mm} & \multicolumn{2}{c|}{10mm} & \multicolumn{2}{c|}{20mm} & \multicolumn{2}{c}{$>$20mm} \\
& C & S & C & S & C & S & C & S & C & S \\
\hline
CeTr & 0.999 & 1.000 & 0.999 & 1.000 & 0.998 & 1.000 & 0.980 & 0.987 & 0.980 & 0.988 \\
HA   & 0.959 & 0.980 & 0.973 & 0.986 & 0.987 & 0.994 & 0.981 & 0.989 & 0.980 & 0.987 \\
SMA  & 0.925 & 0.975 & 0.967 & 0.989 & 0.982 & 0.994 & 0.973 & 0.985 & 0.984 & 0.989 \\
PV   & 0.927 & 0.953 & 0.955 & 0.974 & 0.957 & 0.974 & 0.978 & 0.989 & 0.981 & 0.987 \\
SMV  & 0.960 & 0.997 & 0.956 & 0.987 & 0.958 & 0.980 & 0.975 & 0.987 & 0.983 & 0.987 \\
\hline
\end{tabular}
}
\end{minipage}
\vspace {-.4cm}
\end{table}

\begin{figure}[t]
    \centering
\includegraphics[width=\linewidth]{figures/combinedfigure.pdf}
    \caption{ 
    Left: RWR (top) and coverage (bottom) as a function of vessel distance for both datasets. Right: Comparison of empirical coverage at different confidence levels between our method (SACP) and standard Class-Conditional CP (CCCP).}
    \label{fig:results}
\vspace {-.4cm}
\end{figure}

\vspace{1mm}\noindent\textbf{Comparison with Standard Class-Conditional CP.}
Our spatially adaptive approach demonstrates significantly improved coverage ($0.981\pm 0.005$ SEM vs $0.968 \pm 0.038$ SEM, paired t-test t=3.366, p=0.003). Near vessels ($\leq2mm$), we achieve both superior coverage (0.981 vs. 0.954) and reduced RWR (2.762 vs. 2.887). Figure~\ref{fig:results} shows consistently better coverage across target confidence levels, particularly in the $40-80\%$ range. Our method maintains high coverage while exhibiting decreasing RWR with distance from vessels, from $2.762 \pm 0.150$ SEM at $\leq 2mm$ to $2.525 \pm 0.036$ SEM beyond 20mm, demonstrating that our framework effectively adapts prediction sets based on proximity to critical anatomical structures. Results from additional experiments, including including uniformly conservative CCCP and a binary weighting scheme (Appendices~\ref{ucccp} and~\ref{binary}), and validation on the MSK dataset demonstrating generalizability across different clinical contexts (Appendix~\ref{additional_results}).






