\section{Experiments and Results}
\label{sec:results}

\noindent\textbf{Atrophy Localization.}
We first validate the effectiveness of our method in identifying atrophy in sub-cortical brain regions affected by AD. To achieve this, we use the FSL FIRST tool \cite{fsl_first} to segment these regions and compute mean anomaly scores for each, shown in Fig. \ref{fig_5}. Our results indicate that AD patients exhibit notably higher anomaly scores in the hippocampus (left: 0.282 \( \pm \) 0.495, right: 0.185 \( \pm \) 0.382) and amygdala (left: 0.132 \( \pm \) 0.207, right: 0.108 \( \pm \) 0.208) compared to the hippocampus (left: 0.108 $\pm$ 0.193, right: 0.069 $\pm$ 0.125) and amygdala (left: 0.066 $\pm$ 0.115, right: 0.072 $\pm$ 0.111) for the healthy controls. These results are in line with the clinical expectation of these regions being significantly affected by AD pathology~\cite{alzheimer_affect}, indicating that \textit{MORPHADE} is able to identify atrophy in clinically relevant brain regions.\\

\begin{figure}[t!]
  \centering
  \centerline{\includegraphics[width=0.99\columnwidth]{figures/fig6_v1.png}}
\caption{Anomaly scores for subcortical brain regions for Alzheimer's Disease (AD) and Healthy Control (HC) samples, showcasing markedly higher scores for AD samples in the hippocampus and amygdala, consistent with clinical literature.~\cite{alzheimer_affect}}
\label{fig_5}
\end{figure}

\noindent\textbf{Atrophy Severity.}
We next evaluate the ability of our method to determine the severity of the localized anomalies by comparing our anomaly maps to medial temporal lobe atrophy (MTA) scores \cite{mta} that were assessed by a senior board-certified neuroradiologist. These scores range from 0 to 4 and are assigned based on the degree of structural changes observed in the choroid fissure, the temporal horn of the lateral ventricle, and the hippocampus. Fig. \ref{fig_6} shows a visual correlation between the degree of atrophy highlighted in the anomaly map in these key regions and the MTA scores, demonstrating the utility of our method in determining the severity of detected anomalies. \\

\begin{figure}[t!]
  \centering
  \centerline{\includegraphics[width=0.85\columnwidth]{figures/fig5_v2.png}}
\caption{Anomaly maps for AD patients alongside their corresponding medial temporal lobe atrophy (MTA) scores, demonstrating consistent alignment with AD-related structural changes and clinical MTA assessments.}
\label{fig_6}
\end{figure}

\noindent\textbf{Pathology Detection.}
In this section, we assess the capability of \textit{MORPHADE} in detecting AD at the patient level. Table \ref{tab::benchmark_anomaly_detection} shows the Area Under the Receiver Operating Characteristic curve (AUROC) scores obtained when comparing our method to various baselines for identifying subjects with AD compared to healthy control (HC) subjects. Our model achieves an AUROC of 0.80, surpassing even the 3D supervised baselines ResNet \cite{supervised_resnet} and DenseNet \cite{supervised_densenet}, with AUROCs of 0.77 and 0.74, respectively.

Furthermore, we obtain improved performance compared to methods proposed for unsupervised anomaly detection. These methods are only available in 2D, so were assessed slice-wise with the final anomaly scores obtained by averaging over the slices for each patient. f-AnoGAN \cite{fanogan}, Ganomaly \cite{ganomaly} obtained AUROCs of 0.70 and 0.72, respectively. We also outperform Brainomaly~\cite{brainomaly} (AUROC 0.78), a method that is not strictly unsupervised since it requires pathological samples during training for improved performance. 

We also compare our results to a 3D adversarial AE to illustrate the benefit of utilizing the deformation fields with our method. Fig. \ref{fig_3} shows the reconstructions and residual maps obtained for both methods in representative AD and healthy controls (HC) subjects. Our method produces more refined reconstructions compared to the adversarial AE, shown by the improved MAE and SSIM scores. Moreover, the residual maps show fewer false positives for the healthy subject, while accentuating pathological areas for the AD subject. Using these improved residual maps alone for AD detection achieves a superior performance of AUROC 0.77 compared to 0.74 obtained by the adversarial AE.

%Finally, we demonstrate the utility of our dual-deformation approach, where AD identification was superior using our method compared to using the residual maps from the constrained deformer (AUROC 0.77) or the folding maps from the unconstrained deformer (AUROC 0.79) alone. However, it should be noted that using the folding maps still achieves a high performance; this highlights the utility of these folding maps, which are a new method of computing anomalies that does not rely on image differences between the input and reconstructions.
Finally, we demonstrate the utility of our dual-deformation approach, where AD identification is superior using our method compared to using only the residual maps from the constrained deformer (AUROC 0.77) or the folding maps from the unconstrained deformer (AUROC 0.79). However, it should be noted that using the folding maps alone achieves a high performance; this highlights the effectiveness of using deformations for detecting anomalies, without relying on differences between input and reconstruction.

\input{table_results}

\begin{figure*}[t!]
\centering
\includegraphics[width=0.99\textwidth]{figures/fig2_v3.png}
\caption{A comparison of the performance of \textit{MORPHADE} (\(\beta=10\)) with adversarial AEs for a subject with AD (left) and a healthy control subject (right). The morphological adjustments facilitated by \textit{MORPHADE} enhance reconstruction fidelity, yielding higher Structure Similarity Index (SSIM) values for our method's morphed reconstructions compared to those of the adversarial AE. The residual maps also demonstrate fewer reconstruction errors for the healthy subject, while highlighting atrophy for the subject with AD.}
\label{fig_3}
\end{figure*}

