\vspace {-.3cm}
\section{Spatially-Adaptive Conformal Prediction}

\noindent\textbf{Conformal Prediction.}
Conformal prediction is a statistical framework that produces prediction intervals for any underlying pretrained model with a guarantee on the prediction’s reliability~\cite{vovk2005algorithmic}. 
For a given significance level $\alpha \in \mathbb{R}^{(0,1)}$, CP ensures that for a calibration dataset $\mathcal{D} = \big\{(x_i, y_i)\big\}_{i=1}^n$ and a new test point $(x_{n+1}, y_{n+1})$ drawn from the same distribution, $\mathbb{P}\big(y_{n+1} \in \mathcal{C}(x_{n+1})\big) \geq 1 - \alpha$. 
CP defines a non-conformity score $S:\mathcal{X}\times \mathcal{Y}\rightarrow \mathbb{R}^+$ that quantifies how well $x_{n+1}$ \textit{conforms} to the calibration dataset. The prediction set is then computed  based on the empirical quantiles of these non-conformity scores:  For a chosen confidence level $1-\alpha$, the prediction set $\mathcal{C}(x_{n+1})$ is defined as $\mathcal{C}(x_{n+1}) = \big\{\hat{y} \in \mathcal{Y} : S(x_{n+1}, \hat{y}) \leq \tau_{\alpha}\big\}$, 
where $\tau_{\alpha}$ is the $(1-\alpha)$-quantile of the non-conformity scores.
This guarantee is unconditional and holds for any model and any distribution as long as the underlying exchangeability assumption is satisfied. 
The exchangeability assumption implies that for any permutation $\pi$ of $\{1,2,\ldots,n\}$, permutations of the dataset have the same joint distribution as $\mathbb{P}\big((x_1,y_1),\ldots,(x_n,y_n)\big)=\mathbb{P}\big((x_{\pi(1)}, y_{\pi(1)}), \ldots, (x_{\pi(n)}, y_{\pi(n)})\big)$. 
We refer to~\cite{angelopoulos2023conformal} for a more in-depth introduction to CP. The \textit{conservativeness} of this guarantee is also adjustable (by defining different thresholds for $p$-values) and can be beneficial when cautious coverage is preferred (Appendix~\ref{app_conservative}).

\vspace {-.2cm}
\subsection{Problem Setup}
Let $\mathcal{X}\subset\mathbb{R}^3$ represent a discretized volumetric image obtained from axial slices, where $\mathcal{X}$ is subdivided into a finite, structured grid of cuboidal units called voxels. We define each voxel $x \in \mathcal{X}$  by its indices $(x_a,x_c,x_s)$ along the axial, coronal, and sagittal axes, respectively.
Considering a set of possible labels $\mathcal{Y}$, we represent the true label $y\in\mathcal{Y}$ as the indicator of the organ that the voxel $x$ belongs to, and the baseline segmentation model $f_{\Theta}$ 
obtains predictive probabilities $p(\hat{y}|x)$ associated with each label $\hat{y}\in\mathcal{Y}$.
\vspace {-.1cm}
\begin{definition}[Canonical Object]
Label $l \in \mathcal{Y}$ denotes a canonical object, if $l$ represents a primary structure of interest for the downstream task.      
\end{definition}
\vspace {-.4cm}
\begin{definition}[Critical Masses]
$\mathcal{M}$ is a set of critical masses in the volume, if the proximity of any $m \in \mathcal{M}$ to the canonical object necessitates conservative decision-making for the downstream task.   
\end{definition}

\vspace {-.2cm}
For the clinical settings described in Section~\ref{intro}, a tumor is a \textit{canonical object} for the downstream task of surgery planning for the removal of that tumor and the set of vessels in the volume are \textit{critical masses}, as when a tumor has a vessel in its proximity, the surgeon needs a prediction set with higher uncertainty (more conservative prediction).

\vspace {-.2cm}
\subsection{Spatially-Adaptive Non-Conformity Score}
To apply CP on voxel-wise tasks (e.g., tumor segmentation for surgery planning), we need to address two challenges: (1) CP uses a single threshold $\alpha$ across all classes; thus the prediction set for rare classes will be either over or under-conservative depending on the class distribution; this is particularly crucial in  the tumor segmentation task as tumors are small structures relative to the total CT image volume.
(2) The prediction set is invariant to the voxels, while we expect, the prediction set to be more conservative when voxels of the canonical object (e.g., tumor) is closer to one or more critical masses (e.g., vessels).  

To address the first challenge, we adopt Class-Conditional Conformal Prediction (CCCP), where CP is refined to use various quantile thresholds across different classes~\cite{shi2013applications,sadinle2019least}.  
We compute a distinct threshold $\tau^{\hat{y}}_{\alpha}$ for each label $\hat{y} \in \mathcal{Y}$ independently as the $(1-\alpha)$-quantile of the non-conformity scores as,
\vspace {-.2cm}
\begin{equation}
    \tau^{\hat{y}}_{\alpha} = \text{Quantile}_{1-\alpha}\Big(\{S_{\text{base}}(x_i,y_i): y_i=\hat{y}\}_{i=1}^n\Big)\ .
\label{eq_class_q}
\end{equation}
\vspace {-.2cm}
For a new test data point $(x_{n+1},y_{n+1})$, the prediction set $\mathcal{C}(x_{n+1})$ is constructed as,
\begin{equation}
    \mathcal{C}(x_{n+1}) = \big\{\hat{y} \in \mathcal{Y}: S_{\text{base}}(x_{n+1},\hat{y}) \leq \tau^{\hat{y}}_{\alpha}\big\}\ .
\label{eq_ps_gen}
\end{equation}

\vspace {-.2cm}
To address the second challenge, we define a new score function, $S_{\text{SACP}}$, that augments the original CP non-conformity score function, $S_{\text{base}}$, with a parameterized weight $w_v \in \mathbb{R}^{[0.5,1)}$, as a multiplicative factor denoted as,   
\begin{equation}
    S_{\text{SACP}}(x|\hat{y}=l) = w_v(x,l)\cdot S_{\text{base}}(x|\hat{y}=l)\ ,
\label{sacp_nonconf}
\end{equation} 
where $v \in \mathcal{M}$ is the nearest critical mass to the voxel $x$, and $l \in \mathcal{Y}$ is the canonical object.  Our intention is to make the impact of the weight irrelevant ($w_v\approx1$) when voxels are far from both critical masses and canonical object, therefore generating a prediction set as conservative as the original base function and maximizing the impact of the weight ($w_v=0.5$) when voxels are very close to critical masses and the canonical object. The weight has to be also impacted by our confidence in segmenting the canonical object (tumor) as well as the relevancy of the different critical masses, as we may have more than one critical mass, each with a different relevancy factor for the downstream task. Formally, we have four parameters for our weight function:
\vspace {-.2cm}
\begin{enumerate}
    \item \textbf{$\delta_m$}: The Euclidean distance of each voxel $x$ to the critical mass $m \in \mathcal{M}$.
    \item \textbf{$\phi_l$}: The Euclidean distance of each voxel $x$ to the canonical object $l \in \mathcal{Y}$.
    \item ${\mathcal{I}(l)}$: The information-theoretical surprisal or unexpectedness of observing the canonical object $l$ with probability $p(\hat{y}=l|x)$ that inversely accounts for our confidence on correct segmentation of the canonical object, where $\mathcal{I}(l)\defeq-\log p(\hat{y}=l|x)$. 
    \item $\gamma_m \in \mathbb{R}^{(0,1]}$: A hyperparameter capturing the relevancy and criticality of each critical mass $m \in \mathcal{M}$. 
\end{enumerate}

\vspace {-.2cm}
Putting all together, the weight function for each voxel $x$ is defined as, 
\vspace {-.2cm}
\begin{equation}
w_v(x,l) = \sigma\bigg(\overbrace{\frac{1}{\gamma_v} \Big(\phi_l + \delta_v \mathcal{I}(l)\Big)}^{\tilde{w}_v}\bigg) \qquad s.t. \qquad v = \underset{m\in \mathcal{M}}{\arg\min}\ \delta_{m}\ ,
\label{weight_eq} 
\end{equation}
where $\tilde{w}_v: \mathcal{X} \times \mathcal{Y} \times \mathcal{M} \rightarrow \mathbb{R}^+$ is a function that represents the raw weight value and the constraint ensures we consider the nearest critical mass. $\tilde{w}_v$ is then normalized to $w_v(x,l) \in \mathbb{R}^{[0.5,1)}$ using the sigmoid function $\sigma(.)$ (further details in Appendix~\ref{app_sacp_details}).  

The weight approaches 0.5 near critical masses or high-confidence tumor regions, increasing prediction set conservativeness. 
When $\delta_v$ is small and $p(\hat{y}=l|x)$ is high (i.e. $\mathcal{I}(l)$ is low), $w_v$ yields towards its lower bound, reducing the non-conformity scores of label $l$, making it more likely to be included in the prediction set (more conservative). This aligns with the desire to treat voxels around the critical masses and the canonical object more conservatively and expand the prediction set for those areas. Conversely, when $\delta_v$ is getting larger and $p(\hat{y}=l|x)$ smaller ($\mathcal{I}(l)$ is higher), $w_v$ moves closer to its upper bound of 1, eliminating the impact of the weight and  making it less likely for distant regions to be included in the prediction set. $\phi_l$ also behaves similarly. Lower $\phi_l$ (the voxel being closer to the canonical object) yields lower weight, making the set more conservative and vice versa. The relevancy hyperparameter $\gamma_v$ accepts values between zero (strictly greater than zero), for the least critical mass to 1, for the most critical mass. $\gamma_v$ has a diminishing impact on $\tilde{w}_v$ and in turn to $w_v$. For example, if the user sets the relevancy for a critical mass low (e.g., $\gamma_v=0.5$), the weight computed based on all other factors will be doubled, $w_v$ gets closer to 1 and diminishes its impact on original non-conformity score. In contrast, when relevancy increases, $w_v$ gets closer to its lower bound of 0.5 and increases the prediction set size. Note the value of $\gamma_v$ needs to be fine-tuned depending on the context of the application. 

\vspace {-.2cm}
\begin{theorem}[SACP Conservativeness]
If $S_{\text{base}}(x,\hat{y})$ denote the base non-conformity score for a voxel $x \in \mathcal{X}$ with the predictive label $\hat{y}$, and $\tau_{\alpha}^{\hat{y}}$ the $(1-\alpha)$-quantile of $S_{\text{base}}$ 
with the error rate $\alpha$, then for the canonical object $l \in \mathcal{Y}$, the prediction set produced by the non-conformity function $S_{\text{SACP}}(x|\hat{y}=l)$ is at least as conservative as sets produced by $S_{\text{base}}(x|\hat{y}=l)$. See Appendix~\ref{app_proof} for the proof.
\label{conservative_theorem}
\end{theorem}

\vspace {-.5cm}
\begin{corollary}
For an unseen data point $x_{new}$ and $S_{\text{SACP}}(x_{new},\hat{y})$ 
(Equation~\ref{sacp_nonconf}), the inclusion of canonical object label $l$ in the prediction set $\mathcal{C}(x_{new})$ with error rate $\alpha$ depends on spatial properties near high-risk regions that satisfies:
\vspace {-.2cm}
\begin{equation}
    l \in \mathcal{C}(x_{new})\ \Longleftrightarrow\ S_{\text{SACP}}(x_{new}|\hat{y}=l) \leq \tau^{l}_{\alpha}\ ,
\end{equation}
\vspace {-.2cm}
where $\tau^{l}_{\alpha}$ is class-specific threshold as $(1-\alpha)$-quantile of non-conformity scores $S_{\text{base}}$. 
\label{label_corollary}
\end{corollary}

SACP maintains two key theoretical guarantees: (1) the standard CP coverage $\mathbb{P}\big(y \in \mathcal{C}(x)\big) \geq 1-\alpha$, and (2) for the canonical object $l$, $\mathcal{C}(x)$ is at least as conservative as standard CP set (proof in Appendix~\ref{app_proof}). The spatial relationship that~\equationref{sacp_nonconf,weight_eq} promote is particularly valuable in the clinical setting described in Section~\ref{intro} for determining resectability.  
This assessment follows region-specific clinical guidelines - for instance, the NCCN guidelines in the US and DPCG guidelines in the Netherlands - each defining different criteria for vessel involvement and tumor contact thresholds that determine resectability. Consequently, when deploying pretrained tumor segmentation models across different regions, the calibration of vessel-specific importance factors also needs to be adjusted to align with local clinical guidelines and their specific vessel prioritization.
The detailed process of generating a prediction set using SACP is described as Algorithm~\ref{scpa_algo} in Appendix~\ref{app_algo}.


