\section{Introduction}
Cardiovascular diseases remain the leading cause of death worldwide. According to data from the World Health Organization, approximately 17.9 million people die from cardiovascular diseases each year, accounting for 32\% of global deaths \cite{chen_2020FCM_deep}. Accurate vascular morphometry analysis is crucial for early diagnosis, risk assessment, and treatment planning for cardiovascular diseases\cite{sweeney_2024AS_unsupervised,zeng_2024SR_pretrained}.

Vascular segmentation, as a fundamental technology in medical image analysis, aims to accurately extract vascular structure from medical images, including vessel trajectory, morphology, diameter, and other clinical biomarkers\cite{liu2022edge,qi_2023ICCV_snake}. These details are vital for diagnosing vascular conditions such as stenosis or aneurysms and are valuable for surgical planning and intervention guidance\cite{fu_2023TMI_robust,yao_2023CBM_enhancing}. However, significant challenges persist due to factors such as the intricate complexity of vascular structures, inherent imaging quality limitations, and various pathological alterations that complicate automated analysis \cite{Ace_The_MICCAI2024,Shi_Centerline_MICCAI2024}.

\begin{figure*}[t] \centering
    \includegraphics[width=\textwidth]{./imgs/VesselFigure1.pdf}
    \caption{
    Motivation and Performance Overview.
    Comparison of three paradigms for 3D vessel segmentation.
    (Left) Fully Supervised Methods (e.g., nnU-Net): Achieve high accuracy but require labor-intensive voxel-wise annotations ($\sim$8 hours/vessel).
    (Middle) Naive 2D Prompt Methods (e.g., ScribblePrompt): Apply slice-by-slice prompting, which is still time-consuming ($\sim$25 min/vessel) and often leads to topological discontinuities (broken vessels) due to lack of 3D context.
    (Right) Our Method: By utilizing a single initialization point, our approach automatically generates prompts via topological continuity learning. We achieve competitive performance comparable to supervised methods but with orders of magnitude faster interaction ($\sim$2 sec/vessel), ensuring coherent 3D structures.
    % Comparison between supervised methods, 2D prompt methods and our method regarding annotation cost and topological continuity of segmentation results.
    } 
    \label{fig:figure1}
\end{figure*}

Traditional manual segmentation is labor-intensive and prone to inter-observer variability. While deep learning-based methods have automated segmentation in mainstream modalities like coronary CTA \cite{Dai_RIPAV_MICCAI2024,Gen_Force_MICCAI2024,qiu2023corsegrec,Xie_DSNet_MICCAI2024}, their success heavily relies on large-scale, high-quality annotated datasets \cite{guo_2024tmi_3d}. This data dependency becomes a bottleneck for emerging imaging techniques such as Fe-MRA or underrepresented peripheral vessels \cite{ghodrati_2022MRM_automatic}.
Recently, promptable foundation models, such as the Segment Anything Model (SAM), have emerged as a promising solution to annotation scarcity due to their impressive zero-shot generalization capabilities \cite{zhang2025dcm, zhang2025atlas}. 
However, a critical limitation hinders their application in 3D medical imaging: standard foundation models are inherently 2D. 
Applying them to 3D volumes typically requires slice-by-slice prompting, which disrupts the 3D spatial continuity of vascular structures and results in prohibitive interaction costs \cite{magg2025midl_zero}.
This issue is particularly pronounced in complex scenarios like Fe-MRA, where sparse acquisition protocols lead to topological discontinuities \cite{si_2025MJMRI_exploring}, making simple slice-wise propagation unreliable \cite{si_2024NSR_unveiling}.
% Traditional manual segmentation methods are not only time-consuming but also labor-intensive, and prone to subjective influences, making it difficult to ensure reproducibility and scalability across large clinical datasets. 
% Deep learning-based vascular segmentation methods have achieved remarkable progress in mainstream modalities like coronary CTA and cerebral DSA \cite{Dai_RIPAV_MICCAI2024,Gen_Force_MICCAI2024,qiu2023corsegrec,Xie_DSNet_MICCAI2024}, automating tasks with improved efficiency and accuracy.
% However, the success of deep learning methods heavily relies on large-scale high-quality annotated data\cite{guo_2024tmi_3d}. 
% For clinically valuable but relatively underrepresented vascular regions (e.g., peripheral vessels in the lower extremities) or emerging imaging techniques such as Fe-MRA, obtaining sufficient annotated data often requires substantial expert resources and time investment \cite{ghodrati_2022MRM_automatic}.
% Fe-MRA offers unique advantages for whole-body vascular visualization with superior contrast-to-noise ratio and extended imaging windows, but the sparse acquisition protocol often leads to topological discontinuities in vessel structures \cite{si_2025MJMRI_exploring}. 
% This data dependency and the challenge of maintaining topological consistency severely limit the application and generalization of existing deep learning methods in these new clinical scenarios \cite{si_2024NSR_unveiling}.

{To address the dimensionality gap, recent efforts have sought to adapt foundation models for 3D medical tasks, generally falling into training-free or trainable categories \cite{zhang2025atlas}. 
Training-free methods, such as $\mu$SAM \cite{archit2025NM_usam} and MedicoSAM \cite{archit2025TMI_medicosam}, typically employ projection-based strategies where masks are projected to adjacent slices to derive prompts. Some approaches also attempt to leverage image registration to align adjacent slices for label propagation. However, registration-based methods are often computationally expensive and prone to failure when dealing with non-rigid deformations or abrupt topological changes common in vascular structures.
These approaches often lack inherent 3D context and rely heavily on the assumption of minimal inter-slice variation, leading to error accumulation in datasets with anisotropic resolutions or large spacing. 
Conversely, trainable adaptations like MedSAM2 \cite{ma2025_medsam2} and PAM \cite{chen2025NPJDM_pam} introduce memory banks or cross-attention mechanisms to capture volumetric context. While effective, MedSAM2 treats static 3D volumes as temporal video sequences, incurring significant computational overhead due to heavy memory modules. Similarly, PAM's slice-to-slice attention mechanism can struggle with long-range dependencies or abrupt topological changes that are not captured by immediate neighbors. Consequently, there remains a need for an efficient, topology-aware strategy that explicitly models anatomical spatial consistency without the heavy computational burden of video-based architectures.}


To bridge the gap between 2D foundation models and 3D vascular segmentation, we propose an innovative framework that unlocks the potential of SAM through \textit{automatic prompt generation}.
As illustrated in Fig.~\ref{fig:figure1}, our approach addresses the inefficiency of slice-wise interaction by leveraging the inherent geometric continuity of vessels.
Instead of requiring dense prompts for every slice, our method necessitates only a single initial point. It then employs a topology-aware strategy to automatically propagate prompts across the entire 3D volume, effectively stitching 2D predictions into a coherent 3D structure.
This strategy integrates two key innovations: 
(1) an end-to-end segmentation framework with global and local continuity constraints to overcome spatial inconsistencies; and 
(2) a confidence-aware prompt generation mechanism that exploits vascular topology to refine and extend segmentation cues iteratively.
By combining the generalization power of SAM with explicit vascular priors, we reduce the annotation time from hours to seconds while maintaining high topological consistency.
% To address annotation scarcity and topological discontinuity in vascular segmentation, we propose a point-prompted framework based on the segment anything model (SAM). 
% As summarized in Fig.~\ref{fig:figure1}, we leverage SAM's powerful zero-shot generalization capabilities while enhancing it with vascular-specific adaptations \cite{chen_2024MIA_MASAM,zeng_2024arxiv_samvmnet}, reducing annotation time from $ 6\sim8 $  hours per case by professionals to just 2 seconds.
% Our approach integrates two key innovations: 
% (1) an end-to-end 3D vascular segmentation framework incorporating global and local continuity constraints to overcome spatial discontinuities caused by traditional slice-wise 2D processing; and 
% (2) a topology-aware point-prompt generation strategy that exploits the inherent geometric continuity of vascular structures, requiring only a single prompt to facilitate complete 3D vascular segmentation.
% By combining SAM's foundation model capabilities with explicit vascular topology priors, our method significantly reduces the dependency on large-scale annotated data while maintaining high segmentation accuracy and topological consistency.

The main contributions are summarized as follows:
\begin{itemize}
\item  We introduce a novel topology-constrained framework that extends 2D promptable models to 3D vascular segmentation, requiring only a single initialization point to achieve comprehensive volumetric segmentation.
\item  We develop a topology-driven automatic prompt generation strategy that leverages vascular structural continuity with iterative confidence-aware refinement, significantly reducing interaction costs compared to slice-wise prompting.
\item  Extensive experiments demonstrate the effectiveness of our method, achieving state-of-the-art performance with 86.44\% Dice on a public CTA dataset and 80.20\% Dice on an in-house Fe-MRA dataset, showcasing superior generalization across different modalities and vascular complexities.
\end{itemize}