\section{Method}
\begin{figure*}[t] \centering
    \includegraphics[width=1.0\textwidth]{./imgs/VesselFigure2.pdf}
    \caption{
    Overview of the proposed framework.
    (Top) The slice-wise framework with 3D consistency processes the volume slice-by-slice, enforcing volumetric consistency via Local Continuity Loss ($L_{local}$) and Global Consistency Loss ($L_{global}$).
    (Bottom) The Automatic Prompt Generation module propagates the prompt from slice $i-1$ to $i$. It utilizes an Offset Prediction network and a Refinement Module based on previous features $F_{i-1}$ to accurately locate the vessel center $p_i$.
    }
    \label{fig:figure2}
\end{figure*}
Upon the success of prompt-based learning in medical imaging \cite{ma_2024NC_MedSAM}, we propose a dimension-hybrid framework that unlocks the potential of 2D foundation models for 3D vascular segmentation. 
As illustrated in Fig.~\ref{fig:figure2}, our method requires only a \textit{single} user-provided point on the first slice to segment the entire 3D volume. 
The framework consists of two synergistic components: 
(1) a \textbf{Slice-wise Framework with 3D consistency} that extends a 2D SAM-based backbone with local and global geometric constraints to ensure volumetric consistency, and 
(2) a \textbf{Topology-Aware Prompt Generator} that automatically propagates and refines point prompts across slices, eliminating the need for slice-by-slice prompting.

\subsection{Weakly Supervised Slice-wise Framework with 3D Geometric Constraints}
Conventional volumetric methods \cite{isensee_2021NM_nnUNet} often suffer from high computational costs, while naive 2D slice-based approaches neglect the inherent 3D continuity of vascular structures. Our approach bridges this gap by employing a pre-trained 2D SAM encoder to extract robust features from individual slices, while enforcing 3D consistency through a dual-level regularization framework.

Given a 3D volume, we process it as a sequence of 2D slices. For each slice $I_i$, the network takes the image and an automatically generated prompt $p_i$ (detailed in Sec.~\ref{sec:prompt_gen}) to predict a segmentation probability map $P_i$. To address the topological fragmentation common in 2D-based predictions, we introduce two loss functions:

\noindent\textbf{Local Continuity Loss ($L_{local}$).} 
Vascular structures exhibit gradual morphological transitions across adjacent slices. To enforce this smoothness, we impose a local continuity constraint that penalizes abrupt changes in the segmentation shape and position.
We formulate the inter-slice gradient consistency as:
\begin{equation}
    L_{grad} = \sum_{i} \left\| \nabla_z P_i - \nabla_z P_{i+1} \right\|_1
\end{equation}
where $\nabla_z P_i = P_i - P_{i-1}$ represents the discrete difference of probability maps along the $z$-axis (slice dimension). Minimizing the difference between consecutive gradients ($\nabla_z P_i$ and $\nabla_z P_{i+1}$) effectively encourages a constant velocity in shape evolution, resulting in smooth vascular boundaries.

Additionally, to ensure intra-slice spatial smoothness and reduce noise, we incorporate a Total Variation (TV) term. The aggregate local loss is defined as:
\begin{equation}
    L_{local} = L_{grad} + \lambda_{\text{TV}} \sum_{i} \left\| \nabla_{xy} P_i \right\|_1
\end{equation}
where $\nabla_{xy}$ denotes the spatial gradient within the 2D slice, and $\lambda_{\text{TV}}$ balances the regularization strength (empirically set to 0.1).

\noindent\textbf{Global Consistency Loss ($L_{global}$).} 
While local constraints handle immediate transitions, they may fail to prevent semantic drift over long distances. We therefore introduce a global consistency loss operating in the latent feature space. We assume that the deep feature representations of the same vessel should remain semantically consistent across the volume.
We maximize the cosine similarity between the feature embeddings $F$ of adjacent slices:
\begin{equation}
    L_{global} = \sum_{i} \left( 1 - \frac{F_i \cdot F_{i+1}}{\|F_i\|_2 \|F_{i+1}\|_2} \right)
\end{equation}
where $F_i$ is the bottleneck feature map extracted by the Prompt Encoder for slice $i$. This constraint ensures that the network maintains a stable internal representation of the vascular target throughout the 3D volume.



\subsection{Topology-Aware Automatic Prompt Generation}
\label{sec:prompt_gen}
Standard prompt-based methods require manual interaction for each slice, which is prohibitive for 3D volumes with hundreds of slices. To automate this, we design a \textbf{Point Prompt Generator} (Fig.~\ref{fig:figure2}, bottom) that predicts the optimal prompt $p_i$ for the current slice based on the segmentation result of the previous slice. It operates in two steps:

\noindent\textbf{\textit{Step 1:} Feature-Driven Offset Prediction.}
Vessels are continuous tubular structures. The center of a vessel in slice $i$ can be inferred from its position in slice $i-1$ plus a displacement vector. 
We employ a lightweight Offset Prediction Module that utilizes the deep features $F$ (from the Prompt Encoder) to estimate this transition. The tentative prompt $\tilde{p}_i$ is calculated as:
\begin{equation}
    \tilde{p}_i = p_{i-1} + \text{Net}_{\text{offset}}(F_{i-1})
\end{equation}
where $p_{i-1}$ is the prompt used in the previous slice. By conditioning the offset on the rich semantic features $F_{i-1}$, the network learns to anticipate the vessel's trajectory (e.g., curvature and branching).

\noindent\textbf{\textit{Step 2:} Confidence-Guided Refinement.}
Relying solely on offset prediction may lead to accumulated errors over time. To correct this, we refine the prompt using the network's own confidence map.
After obtaining the initial probability map $P_i$ using $\tilde{p}_i$, we identify the high-confidence region $\arg\max(P_i)$. The final refined prompt $p_i$ is computed as a weighted fusion:
\begin{equation}
    p_i = (1 - \lambda) \cdot \tilde{p}_i + \lambda \cdot \arg\max(P_i)
\end{equation}
where $\lambda \in [0.1, 0.5]$ is an adaptive scalar derived from the peak confidence value of $P_i$. 
This refinement step acts as a self-correction mechanism: if the network is confident in its segmentation (high $\lambda$), the prompt is pulled towards the actual vessel center; if uncertainty is high, the system relies more on the topological prior $\tilde{p}_i$. This ensures robust tracking even in complex vascular bifurcations.
