\section{Experiments}\label{sec:experiments}
We evaluate our method on four vascular datasets, covering both single and multi-tree scenarios.
Consistent with prior works \cite{trexplorer_super2025, trexplorer2024, vesselformer2024}, all reported experiments use ground-truth root locations as input to our adapted TEASAR algorithm. Dataset-specific modifications of the training procedure, together with details on sample sizes and the data splits, are in Suppl. \sectionref{sec:ext_experiments}. An ablation study quantifying the contributions of each components of our method (Suppl. \sectionref{sec:ablation_study}), a vector noise sensitivity study (Suppl. \sectionref{sec:vector noise sensitivity}) and test time noise sensitivity (Suppl. \sectionref{sec:gaussian noise sensitivity}) are also included. We also evaluate the effect of replacing the U-Net in our method with nnU-Net~\citep{isensee2021nnu} in Suppl.~\sectionref{sec:nnunet}.

\subsection{Model Architecture and Training}
For segmentation and vector prediction, we employ a 4-layer U-Net~\cite{unet_ronneberger2015} with batch normalization and 16 initial feature channels, which double at each downsampling step.
The network is trained on randomly sampled input patches.
Data augmentation includes intensity shifts and randomly masking out $3\times3\times3$ pixel crops.
Training is performed for 300{,}000 iterations with a batch size of 1 using the Adam optimizer.
Aside from dataset-specific input sizes and augmentations, the architecture and training protocol are kept identical across all datasets. We have mentioned the further training details and different settings in Suppl.~\sectionref{sec:training_settings}.

\subsection{Case 1: Single-Tree Data}
For the single-tree datasets, we compare our method against Vesselformer \cite{vesselformer2024}, Trexplorer \cite{trexplorer2024}, and Trexplorer-super \cite{trexplorer_super2025}.
We report all baselines and metrics as in \cite{trexplorer_super2025}, as we were unable to reproduce their published results and thus cannot faithfully compare in terms of our new metrics.
Instead, we follow their evaluation protocol to enable a fair comparison.
Thus we report point-level F1, precision, recall, and radius MAE, as well as branch-level F1 and Betti scores in \tableref{tab:point_metrics}.
Apart from that, we also evaluated the single-tree datasets using our own metrics in \tableref{tab:single tree our metrics} to support future benchmarking.

The \textbf{Single-Tree Synthetic} dataset, introduced in \citet{trexplorer_super2025}, is generated using the Synthetic Vascular Toolkit (SVT) \cite{sexton2025svt}.
Each volume contains a single vascular tree, its segmentation mask, and the corresponding 3D centerline graph.
As shown in \tableref{tab:point_metrics}, our model consistently outperforms the current state-of-the-art across both point-level and branch-level metrics.
Suppl.\ \figureref{fig:qualitative_syn_st} shows qualitative results.

The publicly available \textbf{Parse 2022} pulmonary artery segmentation dataset \cite{luo2024parse} contains 100 computed tomography pulmonary angiography (CTPA) volumes with pixel-wise segmentation masks.
These masks were created semi-automatically by experts using a region-growing approach.
\citet{trexplorer_super2025} subsequently derived centerlines from these masks using the Kimimaro TEASAR implementation \cite{Silversmith_Kimimaro_Skeletonize_densely_2021}.
Note that these ground-truth centerlines were generated \textit{automatically}.
As shown in \tableref{tab:point_metrics}, our model outperforms the current state-of-the-art on both point-level and branch-level F1.
However, at graph level we note some Betti-0 errors: although our post-processing step is designed to correct false splits, it does not fully guarantee global connectivity of the predicted vascular tree.
Suppl.\ \figureref{fig:qualitative_parse} shows qualitative results. We note that a potential bias may favor our method, since the ground-truth skeletons are generated using the TEASAR algorithm.

% ----------------- Single-tree results, trexplorer-super datasets  ---------------------------
\begin{table}
\centering
\scriptsize
\setlength{\tabcolsep}{4pt}
\caption{Quantitative comparison of our method with Vesselformer, Trexplorer and Trexplorer Super for the Single-Tree Synthetic and Parse 2022 datasets.
Please note that we report all baselines and metrics as in \citet{trexplorer_super2025}.
Our results are reported as mean and standard deviation (±) over three independent runs, while baseline results are reported over five runs.
}
\begin{tabular}{clcccccccc}
\toprule
& \multirow{2}{*}{Model} 
& \multicolumn{4}{c}{Point Level} 
& \multicolumn{1}{c}{Branch Level}
& \multicolumn{2}{c}{Graph Level}\\
\cmidrule(lr){3-6} \cmidrule(lr){7-7}\cmidrule(lr){8-9}
& & F1$\uparrow$ & Prec$\uparrow$ & Rec$\uparrow$ & Rad.(MAE)$\downarrow$ 
 & F1$\uparrow$ & $\beta_0\downarrow$ & $\beta_1\downarrow$\\
\midrule
\parbox[t]{1mm}{\multirow{4}{*}{\rotatebox[origin=c]{90}{\textbf{\tiny Synthetic}}}}
& Vesselformer & $48.18${\tiny $\pm5.62$} & $44.53${\tiny $\pm7.87$} & $61.52${\tiny $\pm1.14$} & $0.42${\tiny $\pm0.01$} & $15.95${\tiny $\pm0.36$}& $81.7${\tiny $\pm16.8$} & $653.5${\tiny $\pm138.7$}\\
& Trexplorer & $39.40${\tiny $\pm8.62$} & $30.91${\tiny $\pm9.45$} & $78.21${\tiny $\pm4.13$} & $0.23${\tiny $\pm0.03$} & $26.26${\tiny $\pm7.18$} & $0${\tiny $\pm0.0$} & $0${\tiny $\pm0.0$}\\
& Trexpl. Super & $77.83${\tiny $\pm1.89$} & $91.91${\tiny $\pm3.28$} & $70.44${\tiny $\pm3.02$} & $\mathbf{0.1}${\tiny $\pm0.01$} & $77.12${\tiny $\pm1.59$} & $0${\tiny $\pm0.0$} & $0${\tiny $\pm0.0$}\\
& \textbf{Ours} & $\mathbf{92.25}${\tiny $\pm0.02$} & $\mathbf{95.49}${\tiny $\pm0.01$} & $\mathbf{89.24}${\tiny $\pm0.05$} & $0.29${\tiny $\pm0.00$} & $\mathbf{81.50}${\tiny $\pm0.16$} & $0${\tiny $\pm0.0$} & $0${\tiny $\pm0.0$}\\
\midrule
\parbox[t]{1mm}{\multirow{4}{*}{\rotatebox[origin=c]{90}{\textbf{\tiny Parse2022}}}}
& Vesselformer & $16.43${\tiny $\pm0.78$} & $18.49${\tiny $\pm1.84$} & $15.28${\tiny $\pm0.83$} & $1.11${\tiny $\pm0.03$} & $1.99${\tiny $\pm0.16$} & $410${\tiny $\pm23.9$} & $246.7${\tiny $\pm78.1$}\\
& Trexplorer & $10.01${\tiny $\pm4.98$} & $9.87${\tiny $\pm3.76$} & $12.01${\tiny $\pm7.46$} & $1.21${\tiny $\pm0.30$} & $3.71${\tiny $\pm1.91$} & $0${\tiny $\pm0.0$} & $0${\tiny $\pm0.0$}\\
& Trexpl. Super & $39.46${\tiny $\pm1.93$} & $55.27${\tiny $\pm3.00$} & $33.99${\tiny $\pm3.34$} & $\mathbf{0.56}${\tiny $\pm0.01$} & $23.46${\tiny $\pm1.09$} & $0${\tiny $\pm0.0$} & $0${\tiny $\pm0.0$}\\
& \textbf{Ours} & $\mathbf{57.52}${\tiny $\pm0.66$} & $\mathbf{59.11}${\tiny $\pm0.37$} & $\mathbf{57.81}${\tiny $\pm0.89$} & $0.58${\tiny $\pm0.02$} & $\mathbf{35.33}${\tiny $\pm1.19$} & $1.85${\tiny $\pm0.46$} & $0${\tiny $\pm0.0$}\\

\bottomrule
\end{tabular}
\label{tab:point_metrics}
\end{table}

\begin{table}
    \centering
    %\setlength{\tabcolsep}{5pt}
    \scriptsize
    \caption{\label{tab:single tree our metrics}Quantitative results of our method on the Single-Tree datasets using our proposed evaluation metrics to support future benchmarking.
    We report mean and standard deviation (±) over three independent runs.
    %Lower FM and FS values indicate better topology preservation.
    %with segmentation-based baselines
    }
    \begin{tblr}{width=\linewidth, rows={abovesep=1pt, belowsep=1pt},
    } %, rows={abovesep=1pt, belowsep=1pt}
        \midrule
        \SetCell[r=2]{l}{Dataset}
            & \SetCell[c=3]{c}{Edges} & & & 
            \SetCell[c=2]{c}{FM$\downarrow$} & & \SetCell[c=2]{c}{FS$\downarrow$} &\\
        \cmidrule[lr]{2-4} \cmidrule[lr]{5-6} \cmidrule[lr]{7-8}
         & F1$\uparrow$ & Prec$\uparrow$ & Rec$\uparrow$
            & Rel. & Abs. & Rel. & Abs. \\
        \midrule
        Synthetic & 
          $0.89${\tiny $\pm0.001$} & $0.93${\tiny $\pm0.001$} & $0.87${\tiny $\pm0.001$} & 
          $0.010${\tiny $\pm0.0$} & $25.86${\tiny $\pm0.10$}  & $0.009${\tiny $\pm0.0$}  & $25.86${\tiny $\pm0.10$}
           \\
        %\midrule
        Parse2022 &
          $0.69${\tiny $\pm0.015$}  & $0.90${\tiny $\pm0.004$} & $0.57${\tiny $\pm0.012$} &  
          $0.007${\tiny $\pm0.0$} & $83.8${\tiny $\pm1.37$}  & $0.004${\tiny $\pm0.0$}  & $85.62${\tiny $\pm1.68$}  \\
        \bottomrule
    \end{tblr}
\end{table}


\subsection{Case 2: Multi-Tree Data}
Although vascular networks are ideally single-tree structures, real data often contain multiple trees due to challenges in separating arteries and veins or imaging artifacts.
Here, we report our recommended metrics, namely edge-level F1, precision, recall, false merges (FM), and false splits (FS).
We compare Vesselpose against different segmentation-based approaches, where we skeletonize the resulting binary masks with Kimimaro TEASAR.
%Because these baselines output binary masks, we extract centerlines using TEASAR (Kimimaro).
%\cite{Silversmith_Kimimaro_Skeletonize_densely_2021}

The \textbf{Multi-Tree Synthetic} dataset originates from \citet{tetteh2019_deepvesselnet} and is generated using vessel formation simulations \cite{schneider2012tissue}.
We compare our method to vesselFM \cite{Wittmann_2025_CVPR} and a standard U-Net \cite{unet_ronneberger2015}.
We also report an upper bound by applying TEASAR directly to the ground-truth masks of \citet{schneider2012tissue}.
The results in \tableref{tab:multi_tree_comp} show that our method consistently outperforms these segmentation-based baselines.
Original TEASAR produces one tree per connected component, but baseline segmentations frequently merge distinct trees into a single component.
As a result, their false merge rates are substantially higher than our method.
In \figureref{fig:qualitative_multitree_synthetic} we show qualitative results and discuss failure cases of our method.

The \textbf{Multi-Tree Micro-CT} data were acquired from perfused rat hearts using a solidifying Microfil contrast agent, using a protocol broadly similar to that described in \cite{napieczynska2024muct}. This approach provides strong vascular contrast and enables visualization of small vessels.
The dataset is still under study and may be made publicly available at a later stage.
We use four rat heart volumes: one to fine-tune a U-Net pretrained on the synthetic multi-tree data, and three for validation and testing. For these, we annotated three $400 \times 400 \times 400$ pixel crops using CATMAID \cite{catmaid2009,catmaid2016}.
Details on the data, model, and fine-tuning procedure are provided in Suppl.\ \sectionref{sec:ext_micro_ct}.
\tableref{tab:multi_tree_comp} shows that our method consistently outperforms a standard U-Net with TEASAR.
Although the dataset is relatively small, the observed performance improvement is consistent with those reported on the other datasets.
Our higher absolute FM and FS values stem from reconstructing more complete skeletons, whereas U-Net and regular TEASAR miss large graph regions—reflected in our correspondingly lower relative FM/FS counts, also seen in Suppl. \figureref{fig:qualitative_micro-ct}.
 
% ----------------- Multi-tree datasets ---------------------------
%[htbp!]
\begin{table}
    \centering
    %\setlength{\tabcolsep}{5pt}
    \scriptsize
    \caption{\label{tab:multi_tree_comp}Quantitative comparison of our method on the Multi-Tree Synthetic and Micro-CT Heart datasets.
    We compare against U-Net, vesselFM, and the ground-truth (GT) segmentation, each followed by TEASAR skeletonization.
    %Lower FM and FS values indicate better topology preservation.
    Because a more complete prediction can yield higher absolute FM and FS counts than an incomplete graph, we additionally report relative FM/FS values, obtained by dividing the absolute counts by the total number of predicted edges.
    We report mean and standard deviation (±) over three independent runs (except for vesselFM and GT).
    }
    \begin{tblr}{width=\linewidth, rows={abovesep=1pt, belowsep=1pt},
    cell{3}{1} = {r=4}{cmd=\rotatebox{90}},
    cell{7}{1} = {r=2}{cmd=\rotatebox{90}},
    } %, rows={abovesep=1pt, belowsep=1pt}
        \midrule
        & \SetCell[r=2]{l}{Method}  
            & \SetCell[c=3]{c}{Edges} & & & 
            \SetCell[c=2]{c}{FM$\downarrow$} & & \SetCell[c=2]{c}{FS$\downarrow$} &\\
        \cmidrule[lr]{3-5} \cmidrule[lr]{6-7} \cmidrule[lr]{8-9}
            & & F1$\uparrow$ & Prec$\uparrow$ & Rec$\uparrow$
            & Rel. & Abs. & Rel. & Abs. \\
        \midrule
        %\SetCell[c=8]{l}{\textbf{Multi-Tree Synthetic}}\\
        %\midrule
        \textbf{\makecell{\tiny Multi-Tree\\\tiny Synthetic}} & U-Net 
        %\textbf{\makecell{MT-syn}} & U-Net 
          & $0.46${\tiny $\pm0.001$} & $0.64${\tiny $\pm0.002$} & $0.36${\tiny $\pm0.001$} 
          & $0.02${\tiny $\pm0$} & $51.87${\tiny $\pm0.68$} 
          & $0.01${\tiny $\pm0.002$} & $51.70${\tiny $\pm3.39$} \\
        & VesselFM 
            & 0.46 & 0.62 & 0.36 
            & 0.02 & 51.3 
            & 0.01 & 58.3 \\
        & GT Segm 
            & 0.46 & 0.64 & 0.36 
            & 0.02 & 52.1 
            & 0.01 & 38.4 \\
        & Ours
            & $\mathbf{0.80}${\tiny $\pm0.002$} & $\mathbf{0.79}${\tiny $\pm0.002$} & $\mathbf{0.80}${\tiny $\pm0.001$}
            & $\mathbf{0.007}${\tiny $\pm0$} & $\mathbf{30.80}${\tiny $\pm1.10$}
            & $\mathbf{0.007}${\tiny $\pm0$} & $\mathbf{29.67}${\tiny $\pm1.80$} \\
        \midrule
        %\SetCell[c=8]{l}{\textbf{Micro-CT Heart}}\\
        %\midrule
        \textbf{\makecell{\tiny Micro\\\tiny CT}} & U-Net % \\ Heart 
        %\textbf{\makecell{$\mathbf{\mu CT}$}} & U-Net % \\ Heart 
            & $0.32${\tiny $\pm0.03$} & $0.22${\tiny $\pm0.04$} & $0.57${\tiny $\pm0.01$} 
            & $0.01${\tiny $\pm0$} & $\mathbf{26.25}${\tiny $\pm2.25$} 
            & $0.009${\tiny $\pm0$} & $\mathbf{23.5}${\tiny $\pm4.2$} \\
        & Ours 
            & $\mathbf{0.50}${\tiny $\pm0.002$} & $\mathbf{0.43}${\tiny $\pm0.001$} & $\mathbf{0.63}${\tiny $\pm0.002$}
            & $\mathbf{0.006}${\tiny $\pm0$} & $45.5${\tiny $\pm1.4$}
            & $\mathbf{0.006}${\tiny $\pm0$} & $42.5${\tiny $\pm1.4$} \\
        \bottomrule
    \end{tblr}
\end{table}
%[htbp!]
\begin{figure}
  \centering

  % ------------------ Row 1: 3 images across 0.9 linewidth ------------------
  \makebox[0.9\linewidth]{%
    \subfigure[Ground-truth]{%
      \includegraphics[width=0.30\linewidth]{figures/qual_multi_tree_synth/multitree_GT.jpg}
    }\hfill
    \subfigure[Ours]{%
      \includegraphics[width=0.30\linewidth]{figures/qual_multi_tree_synth/multitree_ours.jpg}
    }\hfill
    \subfigure[U-Net + TEASAR]{%
      \includegraphics[width=0.30\linewidth]{figures/qual_multi_tree_synth/multitree_baseline.jpg}
    }%
  }

  \vspace{4pt}

  % ------------------ Row 2: 2 images across 0.9 linewidth ------------------
  \makebox[0.9\linewidth]{%
    \subfigure[Ground-truth]{%
      \includegraphics[width=0.45\linewidth]{figures/qual_multi_tree_synth/fm_tree_GT.jpg}
    }\hfill
    \subfigure[Ours]{%
      \includegraphics[width=0.45\linewidth]{figures/qual_multi_tree_synth/fm_tree_ours.jpg}
    }%
  }

  \caption{\textbf{Qualitative results for the multi-tree synthetic dataset.}
  First row: Segmentation mask and skeletons overlaid, where each color represents a distinct tree. 
  Our approach separates most trees, whereas U-Net + TEASAR merge all trees into one component.
  Second row: Failure cases for our method, including missed small terminal branches (red arrows) 
  and falsely merged trees (red rectangle).}
  \label{fig:qualitative_multitree_synthetic}
\end{figure}
