\appendix

\section{TopCow-2024 Circle of Willis Map} \label{sec:5_map}

\begin{figure}[t]
    \centering
    \includesvg[inkscapelatex=false, width = \linewidth]{figures/5_topcow_map.svg}
    \caption{Anatomical map of the Circle of Willis as represented in TopCoW \cite{topcowchallenge}.}
    \label{fig:5_topcow_map}
\end{figure}

Figure \ref{fig:5_topcow_map} presents a schematic overview of the complete Circle of Willis vascular structure from the TopCoW \cite{topcowchallenge} dataset. The dataset includes many anatomical variations, often comprising only subsets of the vessels shown here. Most variability in TopCoW tree geometry arises from differing combinations of the left/right Pcom, Acom, and 3rd-A2 vessels.

\section{Vessel Characteristics from \citet{kuipers2024generating}} \label{sec:5_centerline_characteristics}


\begin{figure}[h]
    \centering
    \includegraphics[width=\linewidth]{figures/5_topcow_characteristics_centerline.png}
    \caption{Comparison of the distributions of length, average radius, and tortuosity for individual vessels between real TopCoW trees and synthetic trees generated by the method of \citet{kuipers2024generating}.}
    \label{fig:5_centerline_characteristics}
\end{figure}

Figure \ref{fig:5_centerline_characteristics} presents the geometric vessel characteristics (length, average radius, and tortuosity) of synthetic TopCoW vessel trees generated by the method proposed in \citet{kuipers2024generating}. Consistent with the qualitative observations in Figure \ref{fig:3_qualitative_analysis}, substantial differences exist between synthetic and real trees, particularly in vessel lengths and tortuosities, which are notably inaccurate in the synthetic data. The method by \citet{kuipers2024generating} operates in two stages: first, generating the vessel tree outline as a semantic point cloud; second, applying a rule-based algorithm to reconstruct the tree topology from the unordered point cloud. This reconstruction involves sequencing points within individual vessels and then establishing bifurcation points between vessels. The algorithm’s performance depends heavily on dense, equidistantly spaced points and well-defined vessel segments. Due to the complex topology of the Circle of Willis, this rule-based approach struggles to generalize, often requiring case-specific tuning to accurately reconstruct tree topology. Failures in reconstruction introduce sharp, anatomically improbable vessel angles, which account for the large discrepancies in length and tortuosity between synthetic and real trees.

\section{Model Architectures} \label{sec:5_architectures}

\begin{figure}[t]
    \centering
    \includesvg[inkscapelatex=false, width = \linewidth]{figures/5_autoencoder.svg}
    \caption{Architecture of the variational autoencoder.}
    \label{fig:autoencodermodel}
\end{figure}

\begin{figure}[t]
    \centering
    \includesvg[inkscapelatex=false, width = 0.7\linewidth]{figures/5_generator.svg}
    \caption{Architecture of the diffusion model.}
    \label{fig:generatormodel}
\end{figure}

% \begin{figure}
%     \centering
%     \includegraphics[width=\linewidth]{figures/encoder_model.png}
%     \caption{Architecture of our variational autoencoder.}
%     \label{fig:autoencodermodel}
% \end{figure}

% \begin{figure}
%     \centering
%     \includegraphics[width=0.7\linewidth]{figures/generator_model.png}
%     \caption{Architecture of our diffusion model.}
%     \label{fig:generatormodel}
% \end{figure}

\paragraph{Variational Autoencoder Architecure}
Figure \ref{fig:autoencodermodel} illustrates the architecture of our variational autoencoder. The encoder input is a semantic point cloud consisting of 3D $(x, y, z)$ coordinates and corresponding one-hot encoded semantic labels. This input is encoded into a set of shape latent variables. The decoder receives these shape latents along with query coordinate points and predicts both a signed distance and a semantic label for each query. Thus, the decoder functions as a conditional semantic signed distance function, with the shape latents serving as the conditioning variables.

\paragraph{Shape Latent Diffusion Architecture}
Figure \ref{fig:generatormodel} illustrates the architecture of the shape latent diffusion model. The model takes a set of shape latents as input. During the forward diffusion process, a noise level is sampled and used to generate noise, which is added to the shape latents. The model then denoises these noisy latents, conditioned on the sampled noise level. New shape latents can be generated by denoising noise drawn from a unit Gaussian. When decoded by the decoder network, these latents produce novel synthetic shapes.


\section{Interpreting the Latent-Diffusion Shape Space} \label{sec:5_interpolation}

\begin{figure}[h]
    \centering
    \includegraphics[width=\linewidth]{figures/5_interpolation.png}
    \caption{Interpolating the diffusion latent-space between a VascuSynth \cite{hamarneh2010vascusynth} tree with a single bifurcation and a tree with a large number of bifurcations.}
    \label{fig:5_interpolation}
\end{figure}

To analyze the learned shape representations, we interpolate between two sampled VascuSynth \cite{hamarneh2010vascusynth} shapes: starting from a simple tree with a single bifurcation and ending with a complex tree featuring many bifurcations. We perform linear interpolation between the diffusion model’s input noise vectors, denoising at each step. The resulting shapes over ten interpolation steps are shown in Figure \ref{fig:5_interpolation}. As we move through the latent space, the initial branches elongate and begin to bifurcate, indicating that the model has learned a robust representation of tree structures. The interpolation preserves the original shape while gradually increasing its complexity.

\section{1-Nearest Neighbor Accuracy and Coverage}\label{sec:5_metrics}

The 1-nearest neighbor accuracy (1-NNA) measures representativeness by quantifying how similar the real and synthetic shape distributions are. It classifies each shape based on the dataset of its nearest neighbor, using a distance metric—in our case, the Chamfer distance (CD) between point clouds. An accuracy of 50\% indicates no distinction between real and synthetic distributions, as half of the real shapes are classified as synthetic and vice versa. Following \citet{erkocc2023hyperdiffusion}, we formulate 1-NNA between a set of reference shapes $S_r$ and a set of generated shapes $S_g$ as
\begin{align}
    \text{1-NNA}(S_r, S_g) = \frac{1}{|S_r| + |S_g|}\sum_{X \in S_r}\mathbf{I}[N_X \in S_r] + \sum_{Y \in S_g}\mathbf{I}[N_Y \in S_g],
\end{align}
where $|\cdot|$ is set-cardinality, $\mathbb{I}$ is the indicator function, and $N_X$ the point cloud that is closest to $X$ from the union of the referene set and synthetic set:
\begin{align}
    N_X = \argmin_{K \in S_r \cup S_g}\text{CD}(X, K).
\end{align}
Coverage (COV) evaluates diversity by finding the nearest real neighbor for each synthetic sample and computing the ratio of unique real neighbors to the total number of synthetic samples. We formulate COV
between a set of reference shapes $S_r$ and a set of generated shapes $S_g$ as
\begin{align}
    \text{COV}(S_r, S_g) = \frac{1}{|S_r|}\left|\left\{ \argmin_{X \in S_r}\text{CD}(X,Y)|Y \in S_g\right\}\right|.
\end{align}

