
\section{Method} \label{sec:method}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figures/method/model_overview.png}
    \caption{Overview of the generative process. \textbf{(a)} Noisy point cloud is sampled and a condition label is provided. \textbf{(b)} The noisy point cloud is iteratively denoised during the reverse diffusion process to form a vessel centerline tree. \textbf{(c)} The unordered points are sequenced to form the final centerline vessel tree.}
    \label{fig:overview}
\end{figure}

Section \ref{sec:method:data} describes the used dataset and how we represent the centerlines for training the diffusion model. The diffusion process and its set-transformer backbone is described in \ref{sec:method:diffusion}. We present the sequencing algorithm that orders the synthetic centerlines sampled from the diffusion model in Section \ref{sec:method:ordering}. An overview of the entire generative process is presented in Figure \ref{fig:overview}. We make our code publicly available \href{https://github.com/ThijsKuipers1995/vessel_diffusion}{here}.

\subsection{Data} \label{sec:method:data}

In this work, we used data from the MR CLEAN Registry, an ongoing, prospective, observational, multicenter study from 16 EVT-capable hospitals in the Netherlands. The dataset consists of 110 patients with an occlusion in the M1 artery, in either the left or right side of the brain, and sufficient image and segmentation quality. We segmented the intracranial arteries using a vessel segmentation algorithm developed within StrokeViewer (NICO.LAB, Amsterdam). The centerline and geometry characteristics (radius, tortuosity, and bifurcation angle) extraction and artery segment labeling were performed with a semi-automated software: iCAFE(© 2016-2018 University of Washington. Used with permission) \cite{chen2018development}. The vessel trees include ICA, ACA, and the MCA (M1 and M1 segments) on the contralateral hemisphere (without occlusion). To increase the number of data samples, we mirrored vessel trees with occlusion on the left side to the right side. Hence, our dataset consists of 181 training samples (95 with and 86 without an M1 occlusion) and 39 testing samples (20 with and 19 without occlusion).

The vessel trees are parameterized by their centerline, i.e., a set of points $\mathbf{x}_i = (\mathbf{c}_i, \mathbf{h}_i)$ where $\mathbf{c}_i$ contain the point coordinates and $\mathbf{h}_i$ the point features, such as the radius and vessel type. The vessel type is a categorical feature represented as a one-hot encoding. Similar to \citet{hoogeboom2022equivariant}, we multiply the categorical features by 0.1 to stimulate the denoising process to first emphasize the shape of the centerline before segmenting it. Note that in practice, $\mathbf{x}_i$ is the concatenation of $\mathbf{c}_i, \mathbf{h}_i$. We sample 256 equidistantly spaced centerline points using linear interpolation that are scaled down by a factor of 24 to be approximately within the range of $[-1, 1]$ and have a standard deviation of 0.5, as this is expected by the EDM \cite{karras2022elucidating} diffusion formulation.

\subsection{Conditional Set-Diffusion} \label{sec:method:diffusion}

The diffusion model consists of three parts. The forward diffusion process adds noise to the input. A denoising function aims to remove the added noise to reconstruct the original input. The reverse diffusion process synthesizes a vessel tree by iteratively denoising noise from the unit Gaussian distribution. The reverse diffusion process requires 18 steps to synthesize a centerline vessel tree.

\paragraph{Diffusion Process}

We use the EDM formulations introduced by \citet{karras2022elucidating}. EDM drastically simplifies the forward diffusion process and the training of the denoising function. Sampling from the diffusion model is also significantly faster, requiring only 18 steps compared to the hundreds of steps such as in \cite{hoogeboom2022equivariant}.

\paragraph{Denoising Objective}

Given a centerline point $\mathbf{x}_i = (\mathbf{c}_i, \mathbf{h}_i)$ and noise $\mathbf{n}_i$, the objective of the denoising function is to map the diffused input back to the original input. The amount of noise $\mathbf{n}_i$ added to the input is determined by the noise level $\sigma$. As $\sigma$ increases, the noisy input increasingly resembles unit Gaussian noise. Formally, the denoising objective is
\begin{align}
    \mathbb{E}_{\mathbf{n}_i \thicksim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})}\left[\frac{1}{N}\sum_{i}^N\left(F_\mathbf{\theta}(\mathbf{x}_i + \mathbf{n}_i, \sigma, C) - \mathbf{x}_i\right)^2\right].
\end{align}
Here, $F_\mathbf{\theta}$ is the denoising function with learnable parameters $\mathbf{\theta}$. During training, $F_\mathbf{\theta}$ is conditioned on the noise level $\sigma$ and any optional auxiliary conditional information $C$. We use learnable embedding vectors for conditioning on the presence of an M1-occlusion and parameterize. The denoising function $F_\mathbf{\theta}$ is modeled by a set-transformer.

\paragraph{Cross Attention}

Our denoising network consists of a series of cross-attention blocks. The attention mechanism allows elements in the input to pass information to each other while being permutation equivariant \cite{vaswani2017attention}. Given matrices $\mathbf{X} \in \mathbb{R}^{N\times L}$ and $\mathbf{Y} \in \mathbb{R}^{M\times P}$, with rows denoting individual set elements, we formulate cross-attention as
\begin{align} \label{eq:attention}
    \text{CrossAttn}(\mathbf{X}, \mathbf{Y}) &= \mathbf{A}(\mathbf{Y}\mathbf{W}_V) \\
    \mathbf{A} &= \text{Softmax}\left(\frac{\mathbf{X}\mathbf{W}_Q  (\mathbf{Y}\mathbf{W}_K)^T}{\sqrt{d}}\right),
\end{align}
where $\mathbf{W}_Q \in \mathbb{R}^{L\times d}$ and $\mathbf{W}_K, \mathbf{W}_V \in \mathbb{R}^{P \times d}$ are learnable parameters mapping $\mathbf{X}$ and $\mathbf{Y}$ to sets of queries, keys, and values respectively. In the case where $\mathbf{X} = \mathbf{Y}$, \equationref{eq:attention} becomes self-attention.

\paragraph{Set Transformer} The set transformer consists of a series of cross-attention blocks. Each cross-attention block consists of three components. First, self-attention is applied where centerline elements exchange information with each other. Next, conditional information is incorporated via cross-attention, serving as an effective conditioning mechanism \cite{rombach2022high}. In the case of unconditional generation, the cross-attention layer becomes a self-attention layer. Finally, an inverse-bottleneck feed-forward network performs channel mixing. Adaptive layer normalization is applied before each component to inject the noise levels. Pre-normalization in the transformer architecture improves gradient stability, reducing training time and the need for hyperparameter tuning \cite{xiong2020layer}.

% \figureref{fig:model_architecture} gives an overview of the network architecture.

% \paragraph{Equidistant Point Distribution Regularization}

% We regularize points to be distributed equidistantly to encourage the denoising function to learn a more general shape representation. Let $d_{ik}$ be the distance between a point $p_i$ and its $k$-th closest neighbor. Given a set of points $P$, we define the equidistant point distribution regularization $\mathcal{L}_{\text{eq}}(P)$ as
% \begin{align}
%     \mathcal{L}_{\text{eq}}(P) = \frac{1}{K}\sum_k^K\sqrt{\frac{1}{N}\sum_i^N\left(d_{ik} - \bar{d}_k\right)^2},
% \end{align}
% where $\bar{d}_k$ is the average $k$-th nearest neighbor distance.


\subsection{Unordered Centerline Sequencing} \label{sec:method:ordering}

Our generative model generates an unordered set of centerline points. We turn the unordered sets into ordered and cleaned-up connected centerline segments with the following three-stage post-processing algorithm.

\paragraph{(1) Noise Reduction} Noise mainly occurs if points are far away from the centerline or if points form clusters. Points that have a nearest-neighbor distance (nn-distance) larger than four times the average nn-distance are removed. Clusters, generally occurring at bifurcations, are reduced by applying furthest-point sampling.

\paragraph{(2) Sequencing} Sequencing starts with an empty sequence $s$ to which points are added to the end. We define the last point added to the sequence as the endpoint and the remaining points as candidate points. The candidate point with the minimum distance to the endpoint, weighted by the current direction of the sequence is chosen as the next endpoint. Given the current endpoint and direction $\mathbf{x}_\text{cur}$ and $\mathbf{d}_\text{cur}$, and a candidate point $\mathbf{x}_i$ with direction $\mathbf{d}_i$, the direction-weighted distance $\text{d}$ is calculated as
\begin{align}
    \text{d}(\mathbf{x}_\text{cur}, \mathbf{x}_i) = (1 + \alpha\mathbf{d}_\text{cur}^T\mathbf{d}_i)||\mathbf{x}_i - \mathbf{x}_\text{cur}||,
\end{align}
where $\alpha$ determines the importance of the current direction. The directions are calculated as $\mathbf{d}_\text{cur} = (\mathbf{\mathbf{x}_\text{prior} - x_\text{cur}}) / ||\mathbf{x}_\text{prior} - \mathbf{x}_\text{cur}||$ and $\mathbf{d}_i = (\mathbf{x}_i - \mathbf{x}_\text{cur}) / ||\mathbf{x}_i - \mathbf{x}_\text{cur}||$, where $\mathbf{x}_\text{prior}$ is the point prior to $\mathbf{x}_\text{cur}$ in the sequence. The point chosen as the initial endpoint is given as the point $\mathbf{x}$ that maximizes the average pairwise inner product between the directions from $\mathbf{x}$ to its $k$ nearest neighbors. Intuitively, the point where its nearest neighbors have similar directions is likely one of the end points of the sequence.

\paragraph{(3) Segment Merging} Individual vessel segments are merged by calculating a common bifurcation point. Given an endpoint $\mathbf{x}$ belonging to segment $s$, its nearest neighbors from the remaining segments are candidate bifurcation points. Candidate points with distances greater than 4 times the average nn-distance of $s$ are discarded. The common bifurcation point is the average of the remaining candidate points.

% \paragraph{(2) End Point Detection} We define an endpoint to be the last point in the sequence. ....direction calculated as...The first few nearest neighbors of an endpoint all flow in the same general direction. The average pairwise inner product between their $k$ nearest neighbors is calculated for each point. The point with the highest score has the most similar flow to its nearest neighbors and is chosen as the endpoint.

% \paragraph{(3) Flow-Based Sequencing} From the endpoint, the point with the lowest flow-weighted distance is chosen as the next point in the sequence. Given the current endpoint and flow $\mathbf{x}_\text{cur}$ and $\mathbf{f}_\text{cur}$, and a candidate point $\mathbf{x}_i$ with flow $\mathbf{f}_i$, the flow-weighted distance $\text{d}_\text{flow}$ is calculated as
% \begin{align}
%     \text{d}_\text{flow}(\mathbf{x}_\text{cur}, \mathbf{x}_i) = (1 + \alpha\mathbf{f}_\text{cur}^T\mathbf{f}_i)||\mathbf{x}_i - \mathbf{x}_\text{cur}||,
% \end{align}
% where $\alpha$ determines the importance of the current flow and $||\cdot||$ denotes the vector-norm. We set $\alpha$ to $0.25$. The sequencing terminates once no more points are available or when the distance to the next point in the sequence is 4 times larger than the cumulative average distance. Segments containing fewer than 5 points are considered noise and discarded. Stages 2 and 3 are repeated until no new segments are found.

% \paragraph{(4) Segment Merging} Segments $S = \{s_1, \cdots, s_n\}$ are merged by assigning a common intersection point. Given an endpoint $\mathbf{x}_i$ belonging to segment $s_i$, its nearest neighbors from the segments $S \setminus \{s_i\}$ are candidate intersection points. Candidate points with distances greater than 4 times the average nn-distance are discarded. If the nn-distance equals zero, the corresponding segment is already merged with $s_i$ and is skipped. The common intersection point between the segments is the average of the nearest neighbors and is assigned to its respective position in the sequences. The merging process is repeated for all segments.
