\documentclass{midl} % Include author names

\usepackage{soul} % for highlighting
\usepackage{lscape,booktabs}
\usepackage[table]{xcolor}
\jmlryear{2025}
\jmlrworkshop{Full Paper -- MIDL 2025}
\jmlrvolume{-- nnn}
\editors{Accepted for publication at MIDL 2025}

\title[CCTA Plane Prediction and Segmentation]{Cardiac Computed Tomography Angiography Plane Prediction and Comprehensive LV Segmentation}

\midlauthor{
\Name{Davis Marc Vigneault\nametag{$^{1}$}} \orcid{0000-0003-3798-9812} \Email{dvigne01@stanford.edu}\\
\addr $^{1}$ Department of Radiology, Stanford University, Stanford, CA, USA \AND
\Name{Ashish Manohar \nametag{$^{1,2,3}$}} \orcid{0000-0001-5239-6260} \Email{ashman@stanford.edu}\\
\addr $^{2}$ Division of Cardiovascular Medicine, Department of Medicine, Stanford University, Stanford, CA, USA \\
\addr $^{3}$ Cardiovascular Institute, Stanford University, Stanford, CA, USA \AND
\Name{Abraham Hernandez \nametag{$^{2}$}} \Email{xbraham@stanford.edu}\\
\Name{Krista Tin Chi Wong \nametag{$^{2}$}} \Email{kristaw@stanford.edu}\\
\Name{Fanwei Kong \nametag{$^{4}$}} \orcid{0000-0003-1190-565X} \Email{kongf@wustl.edu}\\
\addr $^{4}$ Department of Mechanical Engineering and Materials Science, Washington University in St. Louis, St. Louis, MO, USA \AND
\Name{Tea Gegenava \nametag{$^{2}$}} \Email{gegenavat@yahoo.com}\\
\Name{Koen Nieman\midljointauthortext{Contributed equally}\nametag{$^{1,2,3}$}} \Email{knieman@stanford.edu}\\
\Name{Dominik Fleischmann\midlotherjointauthor\nametag{$^{1,3}$}} \orcid{0000-0003-0715-0952} \Email{d.fleischmann@stanford.edu}\\
}

% Abbreviations
\newcommand{\ccta}{CCTA\xspace}
\newcommand{\fov}{FOV\xspace}
\newcommand{\lv}{LV\xspace}
\newcommand{\sax}{SAX\xspace}
\newcommand{\vla}{2CH\xspace}
\newcommand{\oft}{3CH\xspace}
\newcommand{\hla}{4CH\xspace}
\newcommand{\stn}{STN\xspace}
\newcommand{\unet}{U-Net\xspace}
\newcommand{\relu}{ReLU\xspace}

% Versions
\newcommand{\vPython}{3.12.6\xspace}
\newcommand{\vMonai}{1.5\xspace}
\newcommand{\vPyTorch}{2.5.1+cu124\xspace}
\newcommand{\vRoMa}{1.5.0\xspace}
\newcommand{\vUbuntu}{24.04\xspace}

% Cohort
\newcommand{\nttl}{313\xspace}
\newcommand{\ntrn}{250\xspace}
\newcommand{\nval}{30\xspace}
\newcommand{\ntst}{33\xspace}
\newcommand{\nnrm}{89\xspace}
\newcommand{\nlnc}{46\xspace}
\newcommand{\nhcm}{106\xspace}
\newcommand{\ndcm}{72\xspace}

% Image and Architecture Info
\newcommand{\crres}{3.0~mm\xspace}
\newcommand{\crmatrix}{64 \times 64 \times 64}
\newcommand{\crsteps}{4\xspace}
\newcommand{\crfeatures}{32\xspace}

\newcommand{\fnres}{0.8~mm\xspace}
\newcommand{\fnmatrix}{128 \times 128 \times 192}
\newcommand{\fnsteps}{4\xspace}
\newcommand{\fnfeatures}{40\xspace}

\newcommand{\clipmin}{-200}
\newcommand{\clipmax}{600}
\newcommand{\batch}{1\xspace}
\newcommand{\nepoch}{24\xspace}
\newcommand{\lrinit}{$10^{-6}$\xspace}
\newcommand{\lrtarget}{$10^{-4}$\xspace}

\soulregister\ref7
\soulregister\batch7
\soulregister\nepoch7
\soulregister\fov7
\soulregister\fnres7
\soulregister\fnmatrix7
\soulregister\vPython7
\soulregister\vMonai7
\soulregister\vPyTorch7
\soulregister\vRoMa7
\soulregister\vUbuntu7
\soulregister\sax7
\soulregister\vla7
\soulregister\oft7
\soulregister\hla7
\soulregister\cite7
\soulregister\unet7
\soulregister\figureref7

\begin{document}

\maketitle

\begin{abstract}
The use of cardiac computed tomography angiography (\ccta) has dramatically increased over the past decade, with an increasingly recognized role for functional assessment; however, reformatting these datasets into standard cardiac planes and performing quantitative analysis remains time consuming and disruptive to clinical workflows.
Here, we propose a fully automated, volumetric, end-to-end trained network for simultaneous detection of standard cardiac planes and comprehensive left ventricular (\lv) segmentation in the predicted short axis coordinate system.
The architecture consists of a coarse segmentation module, a transformation module, and a fine segmentation module.
The coarse segmentation module provides an initial segmentation of the full field of view (\fov) axial images at low resolution.
The transformation module predicts the rotations corresponding to the standard cardiac planes (short axis, \sax; two chamber, \vla; three chamber, \oft; and four chamber, \hla) and reformats the source volume into the predicted \sax coordinate system at high resolution.
Finally, the fine segmentation module segments the narrow \fov, high resolution \sax volume.
The dataset consisted of \nttl~\ccta studies partitioned into training, validation, and testing in an 80:10:10 split.
Architectural decisions are justified using ablation experiments.
On the test set, the proposed architecture achieved accurate plane predictions (mean angle errors of $9.1\pm6.2^\circ$, $9.5\pm5.4^\circ$, $9.0\pm5.9^\circ$, and $8.8\pm5.9^\circ$~for the \sax, \vla, \oft, and \hla planes, respectively) and high quality segmentations (Dice scores of $0.955\pm0.008$, $0.928\pm0.016$, and $0.808\pm0.029$ for the bloodpool, myocardium, and trabeculations, respectively).
This fully automated pipeline has the potential to replace current manual workflows, expediting the availability of standard cardiac planes and quantitative analysis for clinical interpretation.

\end{abstract}

\begin{keywords}
Cardiac computed tomography angiography (\ccta), segmentation, spatial transformer network (\stn)
\end{keywords}

\section{Introduction}

The use of cardiac computed tomography angiography (\ccta) in the United States increased $85\%$ over the previous decade \cite{reeves_cardiac_2021}, with outpatient, inpatient, and emergency department exams all more than doubling in frequency.
This trend is likely to continue or accelerate owing to the increasing availability of scanners capable of performing high quality cardiac exams, incorporation of coronary CT angiography as a Class I recommendation in the AHA/ACC clinical practice guidelines on the evaluation of chest pain \cite{gulati_2021_2021}, and doubling of reimbursement by Medicare in the United States starting in 2025 \cite{maxwell_coronary_2024}.
Moreover, there is an increasingly recognized role of retrospectively ECG-gated cine acquisitions for functional assessment \cite{peper_functional_2020}, with incremental value over coronary CTA alone \cite{seneviratne_incremental_2010}.
Reformatting these images into standard cardiac planes is critical for standardized comparison between exams, wall thickness measurements, and myocardial segment classification; however, this processing is time consuming and usually requires a third party software package outside the standard clinical PACS system.

The literature on medical image segmentation is extensive, with deep neural networks yielding excellent performance over the past decade.
\citet{Ronneberger2015} first introduced the \unet, a highly successful 2D encoder-decoder architecture with skip connections.
Since then, a plethora of modifications to the \unet have been proposed.
Residual, recurrent, and residual-recurrent versions have been described \cite{he_deep_2016,Milletari2016,alom_recurrent_2019}.
\citet{oktay_attention_2018} added attention gates, using saliency maps to preserve only relevant activations.
Additional connections within \cite{huang_densely_2017, jegou_one_2017} or between \cite{zhou_unet_2020} the network layers have been added to enhance information flow.
Multiple \unet{}s have been combined into ``cascaded'' networks, which in their simplest form provide the predictions of one \unet module as an input to a second \unet module \cite{liu_cascaded_2021}, while more sophisticated implementations densely connect the network layers of successive \unet modules \cite{wu_inner_2023}.
Most recently, more sophisticated approaches using transformers \cite{chen_transunet_2021,cao_swin-unet_2021} and graph neural networks \cite{kong_deep-learning_2021} have also been proposed.
Many of these concepts have been applied to \ccta segmentation \cite{bruns_deep_2020,li_8-layer_2021,jun_guo_automated_2020,wang_auto_2022,kong_deep-learning_2021}.

Combined segmentation and detection of standard cardiac planes from \ccta has received much less attention.
The most closely related work \cite{chen_automated_2021} describes a method to predict \sax, \vla, \oft, and \hla~planes by branching a fully connected network from a \unet~bottleneck; however, several architectural and training decisions deserve further exploration. (a) Separate models are trained to predict each cardiac plane, multiplying training time, but without comparing to a single unified model. (b) Their network is trained using a multi-stage approach, but without comparing to end-to-end training. (c) Regarding the fully connected network branched from the bottleneck, no experiments are reported exploring the effect of hidden layers (either their presence, number, or width) on performance. (d) Promising modifications to the U-Net such as attention gates and residual blocks are not explored. (e) Having learned the transformation parameters, it is reasonable to question whether performance could be improved by segmenting the reformatted images in a second stage; however, this was not investigated.

Therefore, the purpose of this study was to develop a fully automated, volumetric, end-to-end trained network for simultaneous detection of the standard cardiac planes (\sax, \vla, \oft, and \hla) and comprehensive left ventricular (\lv) segmentation (bloodpool, myocardium, and trabeculations) in the predicted \sax coordinate system.

\section{Methods}

\subsection{Dataset}

The dataset consisted of \nttl \ccta studies randomly partitioned into training, validation, and testing in an approximately 80:10:10 split (\ntrn~training, \nval~validation, and \ntst~testing).
Cases were obtained as part of routine clinical practice and were retrospectively collected with IRB approval.
Final clinical diagnoses were normal ($N=\nnrm$), hypertrophic cardiomyopathy ($N=\nhcm$), \lv non-compaction ($N=\nlnc$), and dilated cardiomyopathy ($N=\ndcm$).
Acquisitions were retrospectively ECG-gated and reconstructed at mid-diastole.
Studies were obtained from one of four scanners: SOMATOM Force (Siemens Healthineers; $N=275$), SOMATOM Definition Flash (Siemens Healthineers; $N=36$), Lightspeed VCT (General Electric Healthcare; N=1), or Sensation 64 (Siemens Healthineers; $N=1$).
Slice thicknesses were 0.75~mm for Siemens and 0.625~mm for GE scans. The median reconstructed field of view (\fov) diameter was 190.0~mm (interquartile range: 173.0–209.0~mm), with a median in-plane pixel spacing of 0.37~mm (interquartile range: 0.34–0.41~mm).
Initial myocardial and bloodpool segmentations were obtained using a previously described network \cite{kong_deep-learning_2021} and trabeculations were separated from the bloodpool by thresholding.
These initial segmentations were manually corrected (AM, AH, and KW) using ITK-Snap version 3.8.0, \citep{Yushkevich2019}.
Standard cardiac planes were defined by a cardiologist with fellowship training in cardiac imaging and 10 years of experience (TG).

\subsection{Proposed Architecture}

\begin{figure}[htbp]
\floatconts
  {fig:architecture}
  {\caption{Proposed Network Architecture, consisting of a coarse segmentation module, a transformation module, and a fine segmentation module, trained end-to-end.}}
  {\includegraphics[width=1.0\linewidth]{figures/architecture}}
\end{figure}

The proposed network architecture (\figureref{fig:architecture}) consists of three end-to-end-trained modules: (a) a coarse segmentation module, which segments a large \fov, low resolution image, (b) a transformation module, which predicts the rotations corresponding to the standard cardiac planes, and (c) a fine segmentation module, which segments a narrow \fov, high resolution image reformatted in the SAX coordinate system.
Additionally, the two segmentation modules are cascaded by resampling the coarse segmentation logits and providing these as additional channels to the input of the fine segmentation module.

\subsubsection{Coarse Segmentation Module}

The coarse segmentation module takes as input the axial \ccta volume downsampled to \crres isotropic with a $\crmatrix$ matrix size and produces as output an equivalently sized multi-class segmentation (bloodpool, myocardium, and trabeculations).
The architecture used is a volumetric attention residual \unet with \crsteps downsampling/upsampling steps.
The number of features produced by each convolution block is \crfeatures in the highest resolution stage and is doubled at each downsampling step and halved at each upsampling step.
The fundamental processing block is made up of a $3 \times 3 \times 3$ convolution, group normalization layer \cite{wu_group_2018}, and leaky rectified linear unit (\relu) activation, applied twice at each stage.
This is optionally converted to a residual block by element-wise addition of the input and the result of the last normalization layer, prior to applying the final activation; in practice, convolution and normalization layers are applied within the residual connection to match the feature lengths prior to summing.
Traditional skip connections concatenate feature vectors from matching resolutions in the downsampling and upsampling paths.
These can be converted to attention gates by first multiplying the downsampling path input by a multi-class saliency map learned from the downsampling and upsampling path inputs \cite{oktay_attention_2018}.
The result is passed into a final convolution to produce the raw logits corresponding to the background and foreground classes.

\subsubsection{Transformation Module}

The transformation module is responsible for learning the rotations corresponding to the standard cardiac planes, calculating the \lv centroid, and resampling the input image and the coarse segmentation logits into the learned \sax coordinate system (to be passed as input to the fine segmentation module).
The \sax~plane is chosen for the second segmentation stage because, unlike the long axis planes, the \sax~plane is routinely reviewed as a stack from base to apex and maps directly onto the bullseye plots commonly used to display downstream analyses such as wall thickness, wall thickening, and segmental strain \citep{chen_myocardial_2023}.
The predicted rotations are represented as quaternions, a compact representation widely used in the graphics community due to several favorable mathematical properties.
A matrix of quaternions is predicted as the output of one or more fully connected layers branched from the bottleneck of the coarse segmentation module.
Note, however, that the rotations describing the standard cardiac planes ($Q_{\mathrm{\sax}}$, $Q_{\mathrm{\vla}}$, etc.) are the composite of (a) a shared baseline rotation ($Q_{\mathrm{BLN}}$) orienting the long axis of the \lv perpendicular to the plane of the image and (b) an additional rotation specific to each plane ($Q_{\Delta\mathrm{\sax}}$, $Q_{\Delta\mathrm{\vla}}$, etc.).
Therefore, rather than predicting the final rotations directly, the network is trained to predict the matrix of baseline rotation and the additional rotational offsets $[Q_{\mathrm{BLN}}, Q_{\Delta\mathrm{\sax}}, Q_{\Delta\mathrm{\vla}}, Q_{\Delta\mathrm{\oft}}, Q_{\Delta\mathrm{\hla}}]$.
The \lv centroid is estimated directly from the coarse segmentation prediction probabilities.
Using the \sax rotation quaternion and \lv centroid, the axial input image and coarse segmentation logits are resampled into the \sax coordinate system at \fnres isotropic with a $\fnmatrix$ matrix size.

\subsubsection{Fine Segmentation Module}

The axial input image (and coarse segmentation logits when cascading is employed) are resampled into the \sax~coordinate system at \fnres~isotropic with a $\fnmatrix$ matrix size and provided as input to the fine segmentation module.
Like the coarse segmentation module, the fine segmentation module is a volumetric attention residual \unet, starting with \fnfeatures features in the highest resolution stage, but otherwise identical to the former.

\subsection{Network Implementation and Training}

\subsubsection{Preprocessing and Augmentation}

The training dataset was augmented at runtime by applying random rotations ($100\%$ probability, $\pm 45^\circ$ along each axis) and adding random Gaussian noise ($50\%$ probability, $\sigma \in [0,100]$~HU).
Note that variation in the predicted centroid and \sax quaternion consequently varies the volume provided to the fine segmentation module, resulting in additional implicit augmentation.
Following augmentation, input images were clipped to the range $[ \clipmin, \clipmax ]$ Hounsfield units (``vascular windows'') and normalized to the range $[0, 1]$.

\subsubsection{Network Training}

The coarse and fine segmentation modules are supervised using mean Jaccard loss across all classes ($L_c$ and $L_f$, respectively).
For the transformation module, we provide both \emph{direct} supervision of the predicted quaternions $[Q_{\mathrm{BLN}}, Q_{\Delta\mathrm{\sax}}, Q_{\Delta\mathrm{\vla}}, Q_{\Delta\mathrm{\oft}}, Q_{\Delta\mathrm{\hla}}]$ and \emph{indirect} supervision of the composite quaternions [$Q_{\mathrm{\sax}}, Q_{\mathrm{\vla}}, Q_{\mathrm{\oft}}, Q_{\mathrm{\hla}}]$.
The loss $L_q$ is the sum of the mean squared errors between the ground truth and predicted quaternions for both direct and indirect rotations, which is mathematically closely related to the angle between the rotations they represent.
The total network loss $L_t$ is then given as a weighted sum of these losses:

\begin{equation}\label{eq:loss}
L_t = \alpha_c L_c + \alpha_q L_q + \alpha_f L_f
\end{equation}

\noindent We set $\alpha_c = \alpha_f = 1$ and $\alpha_q = 10/n_q$ where $n_q$ is the total number of quaternions being supervised.
Additional training and implementation details are given in Appendix~\ref{appendix:implementation}.

\section{Experiments and Results}

Results of the hyperparameter search and ablation experiments are presented in \tableref{tab:angles} (angle errors) and \tableref{tab:dice} (centroid errors and Dice scores).
Regarding the transformation module, the depth and width of the hidden layers branched from the coarse segmentation module bottleneck were varied.
Among these, the version with two 128-feature hidden layers (abbreviated ``128-128'') performed best in terms of centroid error ($0.805\pm0.521\mathrm{mm}$), angle error for three of the four standard cardiac planes ($9.1\pm6.2^\circ$ \sax, $9.0\pm5.9^\circ$ \oft, and $8.8\pm5.9^\circ$ \hla), and angle error for the baseline rotation $Q_{\mathrm{BLN}}$ ($6.6\pm3.7^\circ$).
For the \vla plane, the angle error was similar between the 128-128 and best performing networks.
Regarding segmentation performance, the 128-128 network performed slightly worse compared to the best performing network in terms of myocardial Dice ($0.928\pm0.016$ vs $0.930\pm0.016$, $p<0.05$) and trabeculation Dice ($0.808\pm0.029$ vs $0.814\pm0.030$, $p<0.05$).
Bloodpool Dice was similar between the 128-128 and best performing networks.
Because the 128-128 network performed best overall in predicting the standard cardiac planes and differences in Dice score compared to the best performing networks were small, the 128-128 network was selected as the baseline for subsequent ablation experiments; representative segmentations and plane predictions are shown in \figureref{fig:predictions}.

Ablation experiments were performed to explore the value of attention gates, residual blocks, cascading, indirect and direct rotation supervision, end-to-end training, the fine segmentation module, multiple vs single plane predictions, and hidden layers in the transformation module.
Metrics which demonstrated a statistically significant change compared to the proposed network by paired Student's $t$-test ($\alpha=0.05$) are reported below.
Removing the attention gates degraded performance in terms of centroid error but improved trabeculation Dice ($0.813\pm0.029$ versus $0.808\pm0.029$, $p<0.05$).
Removing residual blocks degraded performance for bloodpool Dice.
Removing cascading (that is, providing only the resampled input image without the coarse segmentation logits to the fine segmentation module) degraded performance in terms of the baseline rotation $Q_{\mathrm{BLN}}$ but improved trabeculation Dice ($0.817\pm0.029$ versus $0.808\pm0.029$, $p<0.05$).
Removing indirect supervision of the quaternion rotations degraded performance in terms of the angle errors for all standard cardiac planes and for myocardial Dice.
Removing direct supervision of the quaternion rotations degraded performance in terms of the baseline rotation $Q_{\mathrm{BLN}}$.

To test the effect of end-to-end training, we sequentially trained the coarse segmentation, transformation, and fine segmentation modules (8 epochs each, 24 epochs total), resulting in degraded performance for all metrics.
To test the utility of our two-stage segmentation approach, we removed the fine segmentation module, instead inputting the full field of view, high-resolution images to the first segmentation stage (requiring a reduction in the number of features in the first stage U-Net by a factor of 4 due to GPU memory constraints).
Doing so degraded performance in terms of the \oft and \hla~angle errors, but improved bloodpool Dice ($0.958\pm0.008$ vs $0.955\pm0.008$, $p<0.05$) and trabeculation Dice ($0.834\pm0.029$ vs $0.808\pm0.029$, $p<0.05$).
To test the utility of predicting all standard cardiac planes in a single network, we trained four separate networks, each predicting a single cardiac plane, following the approach taken by \citet{chen_automated_2021}.
Note that the input image and coarse segmentation logits were resampled into whichever clinical plane was predicted, as the \sax rotation was not always available.
The \sax-, \vla-, and \hla-only networks all exhibited degraded performance in terms of bloodpool and trabeculation Dice.
The \oft-only network was not significantly different in terms of any metric.
Finally, removing all hidden layers from the transformation module degraded performance in terms of angle errors for the baseline rotation $Q_{\mathrm{BLN}}$, \sax, \vla, and \oft planes, and additionally degraded performance in terms of bloodpool Dice.

\include{tables/angles.tex}

\section{Discussion and Conclusions}

Here, we present a fully automated, volumetric, end-to-end trained network for simultaneous detection of standard cardiac planes (\sax, \vla, \oft, and \hla) and comprehensive \lv segmentation (bloodpool, myocardium, and trabeculations) in the predicted \sax coordinate system.
The network had high performance in terms of standard cardiac plane detection, with sub-millimeter centroid error and angle error $<10^\circ$ for all standard cardiac planes.
The Dice scores achieved by our network are also high ($0.955\pm0.008$, $0.928\pm0.016$, and $0.808\pm0.029$ for the bloodpool, myocardium, and trabeculations, respectively), which is notable given the separate segmentation of the \lv trabeculations, a high surface-area-to-volume structure.
Note that segmentation of the \lv trabeculations has value in the investigation of diagnoses such as \lv non-compaction cardiomyopathy \cite{manohar_quantitative_2023} but is not typically included as a separate label in segmentation models.

Several key points may be gleaned from our ablation experiments.
First, end-to-end training resulted in significantly improved performance for all metrics compared to training each module separately for a fixed total number of epochs.
Second, training separate models to predict each cardiac plain individually--the approach taken in \citet{chen_automated_2021}--failed to significantly improve angle errors, in spite of quadrupling the total training time required compared to our single unified model.
Third, providing direct supervision of the quaternions, while not significantly changing the final composite rotations or segmentation performance, was necessary to provide accurate intermediate rotations, which are useful in the event that the predicted planes require manual correction.
Fourth, we found that the number and width of hidden layers in the transformation module was an important hyperparameter with significant impact on network performance.


This work has several limitations and areas for future improvement and validation.
First, it would be useful to quantify intra- and inter-observer variability in standard cardiac plane angles in order to contextualize the angle errors observed in our network.
Second, several potential improvements to the segmentation modules, particularly the use of transformer-based modules, have the potential to improve segmentation performance and should be investigated.
Third, whereas our intention in adding cascading (passing features from the coarse segmentation module to the fine segmentation module) was to improve segmentation performance, removing cascading instead resulted in significantly \emph{higher} angle error for the baseline rotation and slightly \emph{higher} trabeculation Dice; this paradoxical result is not fully explained by our experiments and is deserving of further investigation.
Fourth, removing the fine segmentation module degrades \oft and \hla cardiac plane prediction while very slightly \emph{improving} bloodpool and trabeculation Dice; these somewhat counterintuitive results also deserve further investigation.
Fifth, the dataset was obtained retrospectively from a single center; proposed network should undergo further validation in prospectively obtained, multicenter images.
Last, although we explore through ablation experiments many of the features which distinguish our network from the most closely related work \citet{chen_automated_2021}, a direct head-to-head comparison would be valuable.

This fully automated pipeline has the potential to replace current manual workflows, expediting the availability of standard cardiac planes and quantitative analysis for interpretation.



\include{figures/representative-results.tex}

\clearpage

\midlacknowledgments{This study was supported by grants from the Radiological Society of North America (RR24‐065; DV), the Etta K. Moskowitz Foundation (DV), the American Heart Association (AHA 24POST1187968; AM), and the National Institutes of Health (NHLBI R01 HL146754; KN).}


\bibliography{midl25_185}

\appendix

\section{Implementation Details}\label{appendix:implementation}

The network was trained end-to-end with a batch size of \batch~for \nepoch~epochs using the Adam optimizer.
Learning rate warmup was used with an initial rate of $10^{-6}$ and a target rate of $10^{-4}$, achieved using a linear ramp over $3$ epochs.
After the third epoch, the learning rate was exponentially decayed with a multiplicative factor of $0.9$.

The network was implemented in python (version \vPython) using Monai (version \vMonai) and PyTorch (version \vPyTorch).
Conversion between quaternion and matrix rotation representations was performed using RoMa (version \vRoMa).
Experiments were run on an Ubuntu workstation (version \vUbuntu) with a 16 core Intel i7-13700K processor, 64 GB RAM, and a single NVIDIA GeForce RTX 4090 GPU with 24 GB memory.
Please see the repository for additional details.\footnote{\url{https://github.com/sudomakeinstall/2025-midl-ccta-plane-prediction}}

\include{tables/dice.tex}

\end{document}
