\section{Results and Evaluation}

This section presents and evaluates the results of our semantic generative model. We detail our experimental setup in Section \ref{sec:3_experiments} and present and evaluate our results in Section \ref{sec:3_analysis}.

\subsection{Experimental Setup} \label{sec:3_experiments}

We describe the datasets used for evaluation, the evaluation methodology and baseline models, as well as the training and sampling setup.

% Here we introduce the datasets used to validate our models, followed by the implementation details and hyperparameter configurations for training the variational autoencoder (VAE) and latent-diffusion models.

\paragraph{Datasets} We evaluate our model on two distinct vascular geometry datasets. TopCoW \cite{topcowchallenge} contains 125 semantic segmentations of variations of the circle of Willis (CoW). An anatomical, semantic map of the CoW vasculature is provided in Appendix \ref{sec:5_map}.
Next. VascuSynth \cite{hamarneh2010vascusynth} consists of 120 synthetic vascular trees and offers trees with a large variety in number of bifurcations per tree, but does not contain any semantic information. For both datasets, we sample 200,000 points from each shape surface and normalize them globally to a $[-1, 1]$ bounding box.


\paragraph{Evaluation Methodology} Generative models should produce samples that are representative of the real data, diverse, and unique—that is, not present in the training set. To assess representativeness, diversity, and uniqueness, we adopt a two-fold evaluation strategy. First, since our primary objective is to generate semantic cerebral vessel trees, we evaluate performance on the TopCoW dataset. We use the method from \citet{kuipers2024generating} as our baseline, as it also generates semantic vessel trees. With access to semantic labels, we assess representativeness and diversity by comparing the distributions of vessel length, average radius, and tortuosity between synthetic and real samples. Second, we evaluate on VascuSynth and compare against TrIND \cite{sinha2024representing}, a recent method that similarly uses implicit neural representations to generate vessel trees. As VascuSynth and TrIND lack semantic labels, we assess generative performance using global shape metrics: 1-nearest neighbor accuracy for representativeness and coverage for diversity. Details on these metrics are provided in Appendix \ref{sec:5_metrics}. Finally, we evaluate the ability of our model to generate unique vessel trees on both TopCoW and VascuSynth.


\paragraph{Model Training and Sampling} We sample 2048 points from the shape surface as input to the VAE, which are subsequently downsampled to 256 points using cross-attention. 2048 additional surface points and 1024 off-surface points are sampled for calculating the loss. Both VAE and latent-diffusion models utilize six self-attention blocks. 
Detailed model architectures are provided in Appendix \ref{sec:5_architectures}.
All models are trained with a batch size of 16. The VAE is trained for 9,000 epochs with a linear learning rate schedule, starting at $1 \times 10^{-6}$, increasing to $1.5 \times 10^{-4}$ over the first 200 epochs, and then decreasing to zero. The losses in Equation \ref{eq:train_objective} are weighted for equal magnitude, with $\lambda_1 = \lambda_2 = 0.1$, $\lambda_3 = 1 \times 10^{-3}$, and $\lambda_4 = 1$. We apply random rotations of up to $\pm0.1$ radians to the point clouds during VAE training, which significantly improves reconstruction quality.
The latent-diffusion model is trained for 6,000 epochs on TopCoW and 12,000 epochs on VascuSynth.
The same learning rate schedule is applied, with a maximum of $1 \times 10^{-4}$. All models require up to 2 hours and 4 GB of memory when trained on an NVIDIA TITAN Xp GPU.
Synthetic trees are sampled in 100 steps with $\rho = 8$ and $\texttt{S\_churn} = 25$. We refer to \citet{karras2022elucidating} for more details on the sampling algorithm and its hyperparameters.
Meshes are extracted from the zero level-set of the SDF using the marching cubes algorithm \cite{lorensen1998marching}. To ensure topological consistency, i.e., the absence of disconnected vessel segments, we post-process the generated meshes to retain only the largest connected component, or the two largest in the case of TopCoW.

\subsection{Generative Performance Analysis} \label{sec:3_analysis}

\begin{figure}[t]
    \centering
    \includegraphics[width=\linewidth]{figures/3_qualitative_results_with_gt.png}
    \caption{Real and synthetic semantic vessel trees from TopCoW. The \textbf{top} row displays real trees, the \textbf{middle} row shows synthetic trees generated by the proposed method, and the \textbf{bottom} row presents results from \citet{kuipers2024generating}, including failed tree topology reconstructions.}
    \label{fig:3_qualitative_analysis}
\end{figure}

\begin{figure}[t]
    \centering
    \includegraphics[width=\linewidth]{figures/3_topcow_characteristics.png}
    \caption{Comparison of the distributions of vessel length, average radius, and tortuosity for each individual vessel in the real and synthetic TopCoW trees.}
    \label{fig:3_topcow_characteristics}
\end{figure}

\paragraph{Semantic Vessel Tree Generation} We generate a set of semantic TopCoW vessel trees and compare our approach to \citet{kuipers2024generating}. Our qualitative analysis of the vessel tree quality in Figure \ref{fig:3_qualitative_analysis} shows that our method successfully generates the circle of Willis anatomy. The different variations of the circle of Willis are well represented in the synthetic samples. In contrast, the method by \citet{kuipers2024generating} fails to properly construct the tree topology, resulting in unrealistic circle of Willis trees. This is primarily due to the rule-based algorithm failing to reconstruct the tree topology in the presence of excessive noise and inconsistencies in the generated point clouds. Such failures were observed in approximately 90\% of the generated samples, while no such issues occurred with the proposed method, which directly generates the entire tree as a single signed distance field.

To further assess the quality of our synthetic semantic trees, we automatically extract the length, average radius, and tortuosity of each vessel by skeletonizing the generated SDFs. The radius at each centerline point is computed as the shortest distance from that point to the vessel surface. We compare the distributions of these vessel characteristics between the real and synthetic TopCoW trees. The results in Figure \ref{fig:3_topcow_characteristics} reveal distinct geometric differences between the vessels in the real population. These differences are accurately reflected in the synthetic population, including outliers. This suggests that our model generates synthetic vessel trees that are diverse and representative of the real population. The L-Pcom, R-Pcom, and Acom seem to be the most challenging to generate, likely because these vessels only occur in a small subset of the trees in the real population. Additionally, the Acom in particular is short, and due to the lower voxel resolution used for synthetic shape sampling compared to the TopCoW segmentation resolution, the skeletonization often produces skeletons consisting of only one or two voxels, resulting in zero tortuosity. In Appendix \ref{sec:5_centerline_characteristics}, we present vessel characteristics from \cite{kuipers2024generating}, which reveal significant discrepancies from the real TopCoW trees, especially in length and tortuosity. These differences stem from the post-processing algorithm’s failure to accurately reconstruct the tree topology.

\begin{figure}[t]
    \centering
    \begin{minipage}[b]{0.45\textwidth} % Top alignment
        \centering
        \captionsetup{type=table} % Ensure this caption is treated as a table caption
        \caption{Quantitative comparison on VascuSynth. For 1-NNA, 50\% is optimal. For COV, higher is better.} % Caption above the table
        \label{tab:3_quantitatives}
        \begin{tabular}{c|cc}
            \toprule
            metric & 1-NNA (\%) & COV $\uparrow$\\
            \midrule
            TrIND  & $87.4\scriptstyle\pm8.4$ & $0.5\scriptstyle\pm0.1$ \\
            ours    & $\mathbf{57.0}\scriptstyle\pm2.8$ & $\mathbf{0.7}\scriptstyle\pm0.1$ \\
            \bottomrule
        \end{tabular}
    \end{minipage}%
    \hfill
    \begin{minipage}[t]{0.5\textwidth} % Top alignment
        \centering
        \includegraphics[width=\textwidth]{figures/3_qualitative_vascusynth.png}
        \caption{Synthetic VascuSynth trees generated with our model.}
        \label{fig:3_qualitative_vasc}
    \end{minipage}
\end{figure}

\paragraph{Baseline Tree Generation Performance} We compare our method to the results obtained by TrIND \cite{sinha2024representing}, an implicit neural shape (INS) method that generates non-semantic VascuSynth trees using occupancy grids as its implicit shape representation. We report 1-nearest-neighbor accuracy (1-NNA) and coverage (COV) to measure representativeness and diversity. The results in Table \ref{tab:3_quantitatives} show our model out-performing TrIND on both metrics. We attribute this to our use of a single encoder to encode all shapes, which enables weightsharing, resulting in a robust tree distribution that is more suitable for sampling compared TrIND's distribution of individually trained INS weights. Figure \ref{fig:3_qualitative_vasc} demonstrates that our model can generate varied and high-quality tree structures. In Appendix \ref{fig:5_interpolation}, interpolation of the latent space reveals that our model learns a robust vessel tree representation.

\paragraph{Synthetic Vessel Tree Uniqueness} We assess uniqueness by calculating similarity with the Chamfer distance for shapes within the train set (intra-distances) and between the synthetic and train sets (inter-distances). Figure \ref{fig:3_distances} we observe a wide distribution of inter-distances when compared to the intra-distances for VascuSynth, indicating a high degree of uniqueness. For TopCoW, the inter-distances are generally lower than the intra-distances. This indicates that the synthetic trees are less unique, likely due to the greater similarity among real TopCoW trees. As a result, the synthetic trees tend to be more "in-between" the real trees, leading to lower inter-distances. Nonetheless, Figure \ref{fig:3_qualitative_uniqueness} demonstrates that our model is capable of generating unique trees for both TopCoW and VascuSynth.

% finding the most similar shapes between the training and synthetic datasets by calculating the CD. As shown in Figure \ref{fig:3_qualitative_uniqueness}, our model successfully generates unique shapes. When comparing to the CDs measured between most similar shapes in the train sets as shown in Appendix \ref{sec:5_chamfer_distances}, we find comparable similarities for VascuSynth and IntrA. For TopCoW, the shapes seem to be less unique. This is likely due to the higher similarity between real TopCoW trees, resulting in synthetic trees that are more "in-between" the real trees.

\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.5\textwidth} % Top alignment
        \centering
        \includegraphics[width=\textwidth]{figures/3_distances_between_samples.png}
        \caption{Intra and inter Chamfer distances (CDs) between most similar trees within the train set and between train and test sets.}
        \label{fig:3_distances}
    \end{minipage}%
    \hfill
    \begin{minipage}[t]{0.45\textwidth} % Top alignment
        \centering
        \includegraphics[width=\textwidth]{figures/3_qualitative_uniqueness2.png}
        \caption{Synthetic TopCoW and VascuSynth trees with most similar tree from the train set overlayed in gray.}
        \label{fig:3_qualitative_uniqueness}
    \end{minipage}
\end{figure}


% \subsection{Experimental Setup} \label{sec:3_experiments}

% % Here we introduce the datasets used to validate our models, followed by the implementation details and hyperparameter configurations for training the variational autoencoder (VAE) and latent-diffusion models.

% \paragraph{Datasets} We evaluate our model on three distinct vascular geometry datasets. 
% VascuSynth \cite{hamarneh2010vascusynth} consists of 120 synthetic vascular trees and offers trees with a large variety in number of bifurcations per tree.
% Next, TopCoW \cite{topcowchallenge} contains 125 semantic segmentations of variations of the circle of Willis (CoW). An anatomical, semantic map of the CoW vasculature is provided in Appendix \ref{sec:5_map}.
% Finally, IntrA \cite{yang2020intra} contains 116 non-closed annotated meshes of intracranial aneurysm. These datasets exhibit significant geometric diversity, allowing to evaluate the flexibility of our method. 
% For all datasets, we sample 200,000 points from each shape surface and normalize them globally to a $[-1, 1]$ bounding box. 
% Twenty samples were held out for the test sets.

% \paragraph{Baseline} Since our model employs implicit neural representations (INRs) for shape generation, we compare its performance with TrIND \cite{sinha2024representing}, which introduced INRs for vascular geometry generation via supervised occupancy prediction.

% \paragraph{Model and Training Setup} We sample 2,048 points from the shape surface as input to the VAE, which are subsequently downsampled to 256 points using cross-attention. 2048 and 1024 surface and off-surface points are sampled for calculating the loss. VAE and latent-diffusion models utilize six self-attention blocks. 
% Detailed model architectures are provided in Appendix \ref{sec:5_architectures}.
% All models are trained with a batch size of 16. The VAE is trained for 9,000 epochs with a linear learning rate schedule, starting at $1 \times 10^{-6}$, increasing to $1.5 \times 10^{-4}$ over the first 200 epochs, and then decreasing to zero. The losses in Equation \ref{eq:train_objective} are weighted for equal magnitude, with $\lambda_1 = \lambda_2 = 0.1$, $\lambda_3 = 1 \times 10^{-3}$, and $\lambda_4 = 1$.
% The latent-diffusion model is trained until convergence, requiring 6,000 epochs for TopCoW-2024 and IntrA-3D, and 12,000 epochs for VascuSynth.
% The same learning rate schedule is applied, with a maximum of $1 \times 10^{-4}$.
% Synthetic shapes are sampled in 100 steps.

% \subsection{Generative Performance Analysis} \label{sec:3_generative_analysis}
% % Generative models should satisfy the following requirements.
% % First, the synthetic population should be \textit{representative} of the true population.
% % Next, synthetic samples should be \textit{diverse}.
% % Finally, synthetic samples must be \textit{unique} and not be copies from the real population.

% \begin{figure}
%     \centering
%     \includegraphics[width=\linewidth]{figures/3_qualitative_results.png}
%     \caption{Synthetic shapes for VascuSynth (top), TopCoW (middle), and IntrA (bottom).}
%     \label{fig:3_qualitative_results}
% \end{figure}

% \begin{table}
% \centering
% \caption{Representativeness (1-NNA) and diversity (COV) metrics on VascuSynth, TopCoW, and IntrA. Reported are the averages with standard deviations of three evaluations. For 1-NNA, 50\% is optimal. For COV, higher is better. Better results in \textbf{bold}.}
% \label{tab:3_generative_metrics}
% \begin{tabular}{c|cc|cc|cc}
% \hline
% Method & \multicolumn{2}{c|}{VascuSynth}              & \multicolumn{2}{c|}{TopCoW} & \multicolumn{2}{c}{IntrA} \\ 
%  & 1-NNA (\%) $\downarrow$  & COV $\uparrow$         & 1-NNA (\%) $\downarrow$ & COV $\uparrow$  & 1-NNA (\%) $\downarrow$ & COV $\uparrow$  \\ \hline
% TrIND  & $87.4\scriptstyle\pm8.4$ & $0.5\scriptstyle\pm0.1$ & -          & -   & -          & -   \\
% Ours    & $\mathbf{57.0}\scriptstyle\pm2.8$ & $\mathbf{0.7}\scriptstyle\pm0.1$ & $65.9\scriptstyle\pm1.9$    & $0.5\scriptstyle\pm0.1$    & $71.1\scriptstyle\pm4.8$  & $0.3\scriptstyle\pm0.0$\\
% \hline
% \end{tabular}
% \end{table}

% \paragraph{Baseline Generative Performance}

% \begin{figure}[t]
%     \centering
%     \includegraphics[width=\linewidth]{figures/3_topcow_characteristics.png}
%     \caption{Geometric characteristics of real and synthetic TopCoW vessel trees.}
%     \label{fig:3_topcow_characteristics}
% \end{figure}

% We evaluate our generative model on the test set using 1-nearest neighbor accuracy (1-NNA) and coverage (COV) metrics based on Chamfer distance (CD), measuring representativeness and diversity, as shown in Table \ref{tab:3_generative_metrics}. On VascuSynth, our model shows improvements over TrIND. For TopCoW, the different variations of the CoW are well represented. The non-closed shapes in IntrA are closed with smooth surfaces. 
% This closing could result in the lower observed scores due to the additional generated geometry making them more distinct from the true population. Our qualitative results in Figure \ref{fig:3_qualitative_results} demonstrate that our model can generate high-quality and diverse synthetic shapes. Shape-latent interpolation results in Appendix \ref{sec:5_interpolation} indicate that our model learns a robust and interpretable shape-representation. Overall, the performance of our generative models is comparable to recent state-of-the-art general shape generative models \cite{kleineberg2020adversarial, erkocc2023hyperdiffusion}.

% \paragraph{Synthetic Vessel Characteristics}

% \begin{figure}
%     \centering
%     \includegraphics[width=\linewidth]{figures/3_qualitative_uniqueness.png}
%     \caption{Synthetic shapes shown with their most similar counterpart from the training data in gray.}
%     \label{fig:3_qualitative_uniqueness}
% \end{figure}

% Since our method generates semantic information, we further analyze the geometric characteristics of  individual vessels in the synthetic and real TopCoW trees. Using skeletonization, we extract vessel length, average radius, and tortuosity. 
% The results in Figure \ref{fig:3_topcow_characteristics} reveal distinct geometric differences between the vessels in the real population. These differences are accurately reflected in the synthetic population, including outliers. The L-Pcom, R-Pcom, and Acom seem to be the most challenging to generate, likely because these vessels are present in only a small subset of vessel trees in the real population. Additionally, the Acom in particular is short, and due to the lower voxel resolution used for synthetic shape sampling compared to the TopCoW segmentation resolution, the skeletonization often produces skeletons consisting of only one or two voxels, resulting in zero tortuosity.

% \paragraph{Synthetic Vessel Uniqueness}

% We assess uniqueness by finding the most similar shapes between the training and synthetic datasets by calculating the CD. As shown in Figure \ref{fig:3_qualitative_uniqueness}, our model successfully generates unique shapes. When comparing to the CDs measured between most similar shapes in the train sets as shown in Appendix \ref{sec:5_chamfer_distances}, we find comparable similarities for VascuSynth and IntrA. For TopCoW, the shapes seem to be less unique. This is likely due to the higher similarity between real TopCoW trees, resulting in synthetic trees that are more "in-between" the real trees.

% % \subsection{Limitations and Future Work} \label{sec:3_discussion}

% % The key advantage of our self-supervised framework is its adaptability to any surface representation.
% % However, because the SDF is evaluated only on the shape surface, the model can theoretically generate additional zero level set surfaces without penalty.
% % This limitation hinders the learning of high-frequency surfaces, which typically rely on Fourier embeddings of input coordinates \cite{randomgaussianembeddingpaper}.
% % The periodicity of these embeddings can lead to potential repetition in shape patterns (see Appendix \ref{sec:5_embedding}).
% % While organic shapes are generally smooth and characterized by low-frequency features, this constraint reduces the model's ability to represent complex surfaces.
% % Future work could address this by introducing additional constraints, such as evaluating a spherical bounding box around the target shape.
% % Distances from this boundary to the shap surface could be estimated a priori, restricting the model to learn only the zero level set for the desired surface.