\section{Experiments}
\subsection{Dataset}
We evaluate the efficacy and robustness of our proposed method on two distinct vascular datasets: an in-house Fe-MRA dataset and the publicly available SEG.A. 2023 challenge dataset \cite{radl2022Data_avt}.
The Fe-MRA dataset serves to assess performance under weakly supervised conditions for complex peripheral vessels, while the public dataset is employed to rigorously validate cross-modality generalization and segmentation accuracy on the Aortic Vessel Tree in CTA scans.

\noindent\textbf{Fe-MRA Dataset.}
% Our study utilized retrospectively collected FE-MRA data from 50 patients acquired between 2023 and 2024.
% All scans were performed on Siemens scanners with magnetic field strengths of either 3.0 Tesla or 1.5 Tesla.
% The acquisitions exhibit isotropic in-plane resolution of approximately 0.8 $mm$ in the X-Y dimensions, with a slice thickness of approximately 1.0 $mm$ in the Z dimension.
% To establish a reliable ground truth, each volumetric scan was independently annotated by two board-certified radiologists with over 5 years of experience in vascular imaging, with the final reference standards determined through a consensus review process.
{The in-house Fe-MRA dataset was retrospectively collected from Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, between 2023 and 2025, under an IRB-approved protocol. All patient data were anonymized prior to analysis.
It consists of 50 cases exhibiting a wide range of vascular morphologies, including normal vessels, lower extremity varicose veins, and arterial thrombosis. This diversity introduces significant challenges for segmentation, such as severe tortuosity and vessel occlusion.
The images are whole-body MRA scans acquired 30 minutes post-injection of Ferumoxytol. To ensure data diversity, scans were performed using two different MRI systems: a 1.5T Philips scanner and a 3.0T GE scanner. The dataset features heterogeneous spatial resolutions, with voxel spacing ranging from $0.3 \times 0.3 \times 1.0 \text{ mm}$ to $0.8 \times 0.8 \times 1.0 \text{ mm}$, capturing key lower extremity vessels such as the femoral artery/vein and great saphenous vein.
The ground truth was established through a rigorous two-stage process: initial dense annotation by two junior radiologists, followed by a final review and revision by a senior radiologist to ensure clinical accuracy.}


\noindent\textbf{SEG.A. Dataset.} 
To rigorously evaluate generalization, we further validate our method on the SEG.A. benchmark from the MICCAI 2023 Challenge.
This dataset comprises 56 CTA scans of the Aortic Vessel Tree aggregated from three distinct centers, introducing significant anatomical and scanner variability.
The data exhibits high heterogeneity in resolution, with voxel spacing ranging from $0.44 \times 0.44 \times 0.50~\text{mm}$ to $1.37 \times 1.37 \times 5.00~\text{mm}$.
Notably, the dataset includes large-scale volumes exceeding 1,000 slices along the Z-axis, providing a challenging testbed for processing extensive 3D vascular structures.

\noindent\textbf{Data Splitting.} To ensure a rigorous evaluation and prevent data leakage, we strictly perform data splitting at the patient (volume) level, rather than the slice level. 
For the Fe-MRA dataset, the 50 patients are divided into Training (30), Validation (5), and Testing (15). 
For the SEG.A. dataset, we adhere to the official challenge split guidelines.

\subsection{Implementation Details and Evaluation Metrics}
% To demonstrate the versatility of our framework, we integrated it with two distinct point-prompted architectures: a SAM-based model and a UNet-based model, both adopted from ScribblePrompt \cite{wong_2024ECCV_scribbleprompt}.
% This selection highlights our method's ability to extend arbitrary point-prompted backbones for vascular applications without architectural constraints.
% All models were implemented in PyTorch and trained on a single NVIDIA RTX 4090 GPU (24GB).
% Training spanned 200 epochs using the AdamW optimizer (learning rate: $1 \times 10^{-4}$, weight decay: $1 \times 10^{-5}$) and a cosine annealing scheduler with a 10-epoch warm-up.
\noindent\textbf{Architecture and Initialization.} Our framework is built upon the architecture of ScribblePrompt-SAM \cite{wong_2024ECCV_scribbleprompt}. Specifically, we utilize the ViT-b backbone as the Image Encoder. Given that ScribblePrompt-SAM has been extensively pre-trained on a large-scale collection of medical images, we keep the Image Encoder completely frozen during training to preserve its robust feature extraction capabilities and maintain computational efficiency. Similarly, the standard Prompt Encoder remains frozen as its positional encodings for point prompts do not require domain-specific adaptation.
Conversely, the Mask Decoder is initialized from ScribblePrompt-SAM but is set to be fully trainable. This allows the model to adapt to the specific topology of thin, tubular vascular structures, which may differ from the general targets seen during pre-training. 
The mask decoder is optimized with $L_{local}$ which enforces continuous vessel boundaries across neighboring slices and penalizes abrupt shape changes.
Finally, our core contribution, the automatic prompt generator, is a lightweight MLP that is fully trainable and initialized randomly.
The automatic prompt generator is optimized by $L_{global}$ to learn semantic consistency of vessel-center representations across slices, guiding it to track the same anatomical structure.
All models were implemented in PyTorch and trained on a single NVIDIA RTX 4090 GPU (24GB).
Training spanned 200 epochs using the AdamW optimizer (learning rate: $1 \times 10^{-4}$, weight decay: $1 \times 10^{-5}$) and a cosine annealing scheduler with a 10-epoch warm-up.

\noindent\textbf{Training Protocol.} We randomly sample pairs of adjacent slices $(I_{i-1}, I_i)$ from the training volumes in each iteration. The model utilizes the features and predictions from the previous slice $I_{i-1}$ to predict the offset and generate prompts for the current slice $I_i$. This strategy forces the automatic prompt generator to learn robust slice-to-slice transition logic rather than memorizing specific volumetric positions. During the backward pass, gradients derived from the segmentation loss and our proposed 3D consistency losses are back-propagated exclusively to update the automatic prompt generator and the Mask Decoder.

\noindent\textbf{Evaluation Metrics.} 
We employ a comprehensive set of metrics to assess both geometric accuracy and topological fidelity.
Standard segmentation performance is measured using the Dice Similarity Coefficient (DSC) and the 95\% Hausdorff Distance (HD95).
Given the tubular nature of vascular structures, we specifically include clDice \cite{shit_2021_CVPR_cldice} to evaluate centerline connectivity and Betti Error to quantify topological consistency (e.g., broken vessels or false loops), ensuring clinical reliability.

\subsection{Experimental Results}
To comprehensively evaluate the clinical viability and technical superiority of our point-prompted vascular segmentation framework, we conducted a multi-dimensional comparison against two distinct methodological paradigms. 
The first category represents the \textbf{fully supervised baselines}, including the classic UNet \cite{ronneberger_2015MICCAI_UNet} and the self-configuring nnU-Net \cite{isensee_2021NM_nnUNet}, which require labor-intensive pixel-level annotations.
The second category comprises \textbf{state-of-the-art weakly supervised and interactive methods}, specifically SAMed-2D \cite{cheng_2023arxiv_sammed2d}, MIDeepSeg \cite{luo_2021MIA_mideepseg}, and ScribblePrompt \cite{wong_2024ECCV_scribbleprompt}.
Our analysis focuses not only on volumetric overlap (Dice) but places particular emphasis on topological fidelity (clDice, $\beta_0$ error), which is critical for downstream vascular analysis such as centerline extraction and hemodynamic simulation.

\noindent\textbf{Performance on Public SEG.A. Dataset.} 
Table \ref{tab:table1} details the quantitative performance on the SEG.A. dataset. 
Our method demonstrates a commanding lead among weakly supervised approaches, effectively bridging the gap towards fully supervised baselines.
We achieve a Dice score of \textbf{86.44\%}, surpassing the closest competitor, ScribblePrompt, by \textbf{3.44\%}. Notably, this performance is highly competitive with the fully supervised nnU-Net (89.41\%), suggesting that our point-prompted strategy can recover the majority of vascular structures with significantly reduced annotation costs.
The HD95 metric, which is sensitive to outliers, is reduced to \textbf{19.46} $mm$ in our method. This indicates that our approach effectively suppresses false positives in the background, a common issue in SAM-based adaptations where the model struggles with low-contrast medical boundaries.
Most critically, our method excels in preserving vascular connectivity. We outperform ScribblePrompt by \textbf{4.18\%} in clDice and reduce the Betti number error ($\beta_0$) from 3.26 to \textbf{1.88}. This low topological error rate implies that our segmentation masks contain significantly fewer fracture points, ensuring a continuous vascular tree structure that is essential for clinical diagnosis.


\begin{figure*}[t] \centering
    \includegraphics[width=0.87\textwidth]{./imgs/Figure-3-witharrow_00.png}
    \caption{Qualitative comparison of vascular segmentation on SEG.A. (top 3 rows) and Fe-MRA (bottom 3 rows). Topological breaks or false positives are observed in baseline methods, whereas our method effectively corrects them.} 
    \label{fig:figure3}
\end{figure*}

\begin{table}[t]
\centering
\caption{{Quantitative comparison on the public SEG.A. segmentation task. Supervised methods serve as the upper bound. Best results among point-prompted methods are highlighted in \textbf{bold}.}}
\label{tab:table1}
\setlength{\tabcolsep}{4pt}
\resizebox{\textwidth}{!}{\begin{tabular}{l l c c c c}
\toprule[1.5pt]
Type & Method & Dice ($\% \uparrow$) & clDice ($\% \uparrow$) & HD95 ($mm \downarrow$) & $\beta_0$ error ($\downarrow$) \\ 
\midrule
\multirow{2}{*}{Supervised}   
& UNet      & 82.08 & 92.56 & 18.26 & 0.14   \\
& nnU-Net    & 89.41 & 96.29 & 9.37 & 0.09   \\ 
\midrule
\multirow{6}{*}{\shortstack[l]{Point Prompt}} 
& SAMed-2D      & 81.21 & 79.23 & 49.41 & 155.33 \\ 
& MIDeepSeg     & 80.91 & 71.56 & 38.08 & 116.28 \\
& ScribblePrompt-UNet& 81.55 & - & 42.39 & -   \\
& \textbf{Ours(ScribblePrompt-UNet)} & 82.72 & - & 35.06 & -  \\ 
& ScribblePrompt-SAM& 83.00 & 85.65 & 26.42 & 3.26   \\
& \textbf{Ours(ScribblePrompt-SAM)} & \textbf{86.44} & \textbf{89.83} & \textbf{19.46} & \textbf{1.88}   \\ 
\bottomrule[1.5pt]
\end{tabular}
}
\end{table}

\noindent\textbf{Robustness on In-house Fe-MRA Dataset.} 
The Fe-MRA dataset represents a significantly more challenging scenario characterized by intricate peripheral vessels and variable contrast-to-noise ratios. As shown in Table \ref{tab:table2}, this domain shift exposes the fragility of existing methods.
Methods like SAMed-2D and MIDeepSeg exhibit a drastic performance drop (Dice $<72\%$). These models, primarily trained on natural images or standard medical datasets, lack the specific inductive bias required to track thin, branching vessels, leading to fragmented outputs (high $\beta_0$ error $>172$).
In contrast, our framework maintains high robustness with a Dice score of \textbf{80.20\%}. More importantly, we achieve a clDice of \textbf{88.13\%}, which is comparable to the performance on the simpler SEG.A. dataset. This demonstrates that our slice-to-slice propagation mechanism effectively utilizes the 3D coherence of blood vessels, allowing the model to trace vessels even when the local contrast is weak.
The $\beta_0$ error of our method (\textbf{40.25}) is substantially lower than ScribblePrompt (67.46). This reduction confirms that our approach is less prone to ``breaking" thin vessel segments, a frequent artifact in slice-wise 2D segmentation methods that ignore inter-slice consistency.

\begin{table}[t]
\centering
\caption{Quantitative comparison on the challenging in-house Fe-MRA dataset. Note the significant improvement in topological metrics (clDice and $\beta_0$ error) achieved by our method.}\label{tab:table2}
\setlength{\tabcolsep}{5pt}
\begin{tabular}{l c c c c}
\toprule[1.5pt]
Method & Dice ($\% \uparrow$) & clDice ($\% \uparrow$) & HD95 ($mm \downarrow$) & $\beta_0$ error ($\downarrow$) \\ 
\midrule
SAMed-2D       & 69.57 & 61.65 & 38.69 & 207.6   \\
MIDeepSeg      & 72.99 & 67.17 & 38.71 & 172.3   \\
ScribblePrompt & 77.25 & 86.21 & 19.92 & 67.46   \\
\textbf{Ours}  & \textbf{80.20} & \textbf{88.13} & \textbf{11.18} & \textbf{40.25}   \\ 
\bottomrule[1.5pt]
\end{tabular}
\end{table}


\noindent\textbf{Qualitative Analysis.} 
Fig. \ref{fig:figure3} provides a visual comparison that corroborates our quantitative findings. 
In the SEG.A. dataset (top 3 rows), baseline methods often struggle with boundary definition. For instance, SAMed-2D tends to under-segment the vessel walls, while ScribblePrompt occasionally leaks into adjacent tissues.
The disparity is even more pronounced in the Fe-MRA dataset (bottom 3 rows). As indicated by the red arrows, competing methods produce discontinuous dotted patterns for distal vessels, severing the vascular tree. 
Our method, leveraging the propagated point prompts, successfully reconstructs the complete vascular geometry. The 3D renderings clearly show that our result is the only one that maintains the structural integrity of the entire vascular network without significant topological breaks.


\subsection{Ablation Study}
To investigate the individual contributions of the proposed components to the overall segmentation performance and topological consistency, we conducted a comprehensive ablation study on the SEG.A. dataset. We established a baseline model using the pre-trained SAM-based 2D network with slice-by-slice manual prompting (simulated by using the ground truth center of the previous slice without any learnable offset or refinement). As shown in Table~\ref{tab:ablation}, we progressively incorporated the Local Continuity Loss ($\mathcal{L}_{local}$), Global Consistency Loss ($\mathcal{L}_{global}$), Offset Prediction, and Confidence-Guided Refinement.

\noindent\textbf{Effectiveness of Geometric Constraints.}
The baseline model, which treats 3D segmentation as independent 2D tasks, achieved a Dice score of 81.21\%. Introducing the Local Continuity Loss ($\mathcal{L}_{local}$) significantly improved the boundary smoothness between slices, yielding a 1.84\% increase in Dice and reducing the HD95 by over 10mm. This confirms that penalizing inter-slice gradient inconsistencies effectively mitigates the ``stacking artifacts'' common in 2D-to-3D adaptation. Furthermore, the addition of the Global Consistency Loss ($\mathcal{L}_{global}$) provided a substantial boost in topological fidelity, increasing clDice from 82.05\% to 85.12\%. By enforcing feature-level similarity across the volume, $\mathcal{L}_{global}$ prevents semantic drift in slices with low contrast or noise, ensuring the model maintains a stable representation of the vessel throughout the scan.

\noindent\textbf{Impact of Automatic Prompt Generation Strategies.}
A critical innovation of our framework is the replacement of manual prompts with an automated mechanism. We first evaluated the Offset Prediction module (Eq.~4), which predicts the vessel center displacement based on previous features. This mechanism alone achieved a Dice of 85.33\%, demonstrating that the network can effectively learn the trajectory of vascular structures. However, relying solely on offset prediction led to accumulated errors in tortuous vessel segments, as indicated by a suboptimal $\beta_0$ error. Finally, incorporating the Confidence-Guided Refinement (Eq.~5) resulted in the best performance across all metrics. This module acts as a self-correction mechanism; by pulling the prompt towards high-confidence regions, it recovered 1.32\% in clDice and reduced the topological error ($\beta_0$) to 1.88. This result highlights the necessity of combining historical trajectory priors (offset) with current observational evidence (confidence map) for robust 3D tracking.

\begin{table}[t]
\setlength{\tabcolsep}{1.0mm}
\centering
\caption{Ablation study on the SEG.A. dataset. We progressively add components to the Baseline (Naive Slice-wise SAM). $\mathcal{L}_{local}$: Local Continuity Loss; $\mathcal{L}_{global}$: Global Consistency Loss; \textit{Offset}: Feature-Driven Offset Prediction; \textit{Refine}: Confidence-Guided Refinement.}
\label{tab:ablation}
\resizebox{\textwidth}{!}{%
\begin{tabular}{ccccc|cccc}
\toprule[1.5pt]
Baseline & $\mathcal{L}_{local}$ & $\mathcal{L}_{global}$ & Offset & Refine & Dice (\% $\uparrow$) & clDice (\% $\uparrow$) & HD95 (mm $\downarrow$) & $\beta_0$ error ($\downarrow$) \\ \midrule
\checkmark &  &  &  &  & 81.21 & 79.23 & 49.41 & 155.33 \\
\checkmark & \checkmark &  &  &  & 83.05 & 82.05 & 38.12 & 89.45 \\
\checkmark & \checkmark & \checkmark &  &  & 84.19 & 85.12 & 25.60 & 24.18 \\
\checkmark & \checkmark & \checkmark & \checkmark &  & 85.33 & 88.51 & 21.05 & 8.62 \\
\checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \textbf{86.44} & \textbf{89.83} & \textbf{19.46} & \textbf{1.88} \\ \bottomrule[1.5pt]
\end{tabular}%
}
\end{table}

\noindent\textbf{Impact of Propagation Strategies.}
To investigate the optimal strategy for propagating vessel information across slices, we conducted an ablation study comparing our method with registration baselines and the prompt-based $\mu$SAM model \cite{archit2025NM_usam}. The results are presented in Table \ref{tab:propagation_ablation}. 
For the registration baselines, we applied Rigid and Affine transformations to propagate the mask from the previous slice to the current slice. We evaluated two point anchors: the geometric center and centroid of the vessel mask.
% As observed, registration based on the mask centroid consistently outperforms the center-based approach.
As observed, centroid-based anchoring is more effective for affine registration, although it does not improve the rigid baseline.
This confirms that tracking the anatomical trajectory of the vessel is crucial.
However, even the best-performing baseline (Affine-Centroid) yields a lower Dice score compared to our method. This limitation stems from the fact that vessels undergo complex, non-linear morphological deformations across slices that simple rigid or affine transformations cannot model.
While $\mu$SAM offers a simple, training-free solution by directly projecting masks from adjacent slices, its accuracy is limited (83.33\%) due to the lack of deformation modeling. Our method, which explicitly learns inter-slice relationships, achieves a superior Dice score of 86.44\%.
% Although this learning process incurs higher computational costs than $\mu$SAM's direct projection, the significant accuracy gain justifies the overhead.



\begin{table}[t]
\setlength{\tabcolsep}{6mm}
    \centering
    \caption{{Ablation study on comparison of different slice-to-slice propagation strategies. 
    We compare our proposed method against classical registration method and the recent training-free model $\mu$SAM. Rigid and Affine denote using the center or centroid point from the previous slice transformed by rigid or affine registration as the prompt for the current slice. Our method achieves the highest segmentation accuracy with significantly lower inference latency.}}
    \label{tab:propagation_ablation}
\begin{tabular}{lcc}
\toprule[1.5pt]
Method & Dice (\% $\uparrow$)  & Inference Time (s/slice) \\ 
\midrule
Rigid (center)  & 59.29 & 0.55                     \\
Rigid (centroid) & 54.21 & 0.55                     \\
Affine (center) & 82.57 & 0.56                     \\
Affine (centroid) & 84.42 & 0.56                     \\
$\mu$SAM   & 83.33 & -                        \\
Ours    & 86.44 & 0.10         \\ 
\bottomrule[1.5pt]
\end{tabular}

\end{table}