\section{Supplementary Material}
In this section, we provide additional qualitative results on a few failure modes of the VFA method on out-of-distribution contrasts and preprocessing choices.

\subsection{Sensitivity to Preprocessing Choices}

We observed that the performance of VFA is sensitive to preprocessing decisions, particularly the cropping of images prior to registration.
In almost all cases, the uncropped (full field-of-view) T1-weighted images from the NIMH dataset led to poor alignments and substantial registration failures, with labelmaps showing significant misalignment and anatomical distortions.
Conversely, cropping the images to the same size as the LUMIR challenge dataset mitigated these issues and resulted in noticeably improved registration outcomes.
% However, this is a methodological limitation of the VFA method on high resolution images 
% This highlights that VFA's robustness is reduced simply simply by adding 20 pixels to the field of view on either side of the image.
Cropping the images to a fixed size (i.e., $192\times160\times224$ voxels) may not be a viable option in practical scenarios if the FOV is too tight, or the image is large (e.g. 0.8mm isotropic).
In most practical scenarios including clinical ones, registration algorithms are expected to be able to handle a wide range of image sizes and FOVs (including fixed and moving images of different voxel sizes).
% The following figures illustrate some characteristic failure modes for uncropped T1 images.
\autoref{fig:pair_0_frac0p60} to \autoref{fig:pair_19_frac0p60} illustrate the failure modes for uncropped T1 images.

\subsection{Sensitivity to OOD Image Contrasts}
We observed that the performance of VFA is sensitive to the image contrast (i.e. non T1w contrast images).
\autoref{fig:ood_pair_0_t2} to \autoref{fig:ood_pair_19_flair} illustrate the failure modes for T2, T2*, and FLAIR images.

% Macro for crop-nocrop figures
% Usage: \cropnocropfig{image_path}{label}
\newcommand{\cropnocropfig}[2]{%
    \begin{figure}[h!]
        \centering
        \includegraphics[width=\linewidth]{#1}
        \caption{Performance of VFA on the images that are cropped to conform to the LUMIR dataset spec (top row), and the same image pair from the original NIMH dataset (with $256^3$ voxels, bottom). Segmentation labels are shown with FreeSurfer Color LUT.}
        \label{#2}
    \end{figure}
}

\cropnocropfig{figures/crop-nocrop/pair_0_crop_nocrop_frac0p60.png}{fig:pair_0_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_3_crop_nocrop_frac0p55.png}{fig:pair_3_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_3_crop_nocrop_frac0p60.png}{fig:pair_3_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_7_crop_nocrop_frac0p60.png}{fig:pair_7_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_8_crop_nocrop_frac0p55.png}{fig:pair_8_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_8_crop_nocrop_frac0p60.png}{fig:pair_8_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_9_crop_nocrop_frac0p50.png}{fig:pair_9_frac0p50}
\cropnocropfig{figures/crop-nocrop/pair_9_crop_nocrop_frac0p55.png}{fig:pair_9_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_9_crop_nocrop_frac0p60.png}{fig:pair_9_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_10_crop_nocrop_frac0p45.png}{fig:pair_10_frac0p45}
\cropnocropfig{figures/crop-nocrop/pair_10_crop_nocrop_frac0p55.png}{fig:pair_10_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_12_crop_nocrop_frac0p60.png}{fig:pair_12_frac0p60}
\cropnocropfig{figures/crop-nocrop/pair_13_crop_nocrop_frac0p55.png}{fig:pair_13_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_15_crop_nocrop_frac0p55.png}{fig:pair_15_frac0p55}
\cropnocropfig{figures/crop-nocrop/pair_19_crop_nocrop_frac0p60.png}{fig:pair_19_frac0p60}

% Macro for T2 OOD failure figures
% Usage: \oodfailureT{image_path}{label}
\newcommand{\oodfailureT}[2]{%
    \begin{figure}[htbp]
        \centering
        \includegraphics[width=\linewidth]{#1}
        \caption{Performance of VFA on T2-weighted images from the NIMH dataset. Segmentation labels are shown with FreeSurfer Color LUT.}
        \label{#2}
    \end{figure}
}

% Macro for T2star OOD failure figures
% Usage: \oodfailureTstar{image_path}{label}
\newcommand{\oodfailureTstar}[2]{%
    \begin{figure}[htbp]
        \centering
        \includegraphics[width=\linewidth]{#1}
        \caption{Performance of VFA on T2*-weighted images from the NIMH dataset. Segmentation labels are shown with FreeSurfer Color LUT.}
        \label{#2}
    \end{figure}
}

% Macro for FLAIR OOD failure figures
% Usage: \oodfailureFLAIR{image_path}{label}
\newcommand{\oodfailureFLAIR}[2]{%
    \begin{figure}[htbp]
        \centering
        \includegraphics[width=\linewidth]{#1}
        \caption{Performance of VFA on FLAIR images from the NIMH dataset. Segmentation labels are shown with FreeSurfer Color LUT.}
        \label{#2}
    \end{figure}
}

% T2 images
\oodfailureT{figures/ood-failure/pair_0_t2_frac0p50.png}{fig:ood_pair_0_t2}
\oodfailureT{figures/ood-failure/pair_11_t2_frac0p50.png}{fig:ood_pair_11_t2}
\oodfailureT{figures/ood-failure/pair_15_t2_frac0p50.png}{fig:ood_pair_15_t2}
\oodfailureT{figures/ood-failure/pair_17_t2_frac0p50.png}{fig:ood_pair_17_t2}

% T2star images
\oodfailureTstar{figures/ood-failure/pair_1_t2star_frac0p50.png}{fig:ood_pair_1_t2star}
\oodfailureTstar{figures/ood-failure/pair_5_t2star_frac0p50.png}{fig:ood_pair_5_t2star}
\oodfailureTstar{figures/ood-failure/pair_12_t2star_frac0p50.png}{fig:ood_pair_12_t2star}
\oodfailureTstar{figures/ood-failure/pair_13_t2star_frac0p50.png}{fig:ood_pair_13_t2star}
\oodfailureTstar{figures/ood-failure/pair_18_t2star_frac0p50.png}{fig:ood_pair_18_t2star}

% FLAIR images
\oodfailureFLAIR{figures/ood-failure/pair_2_flair_frac0p50.png}{fig:ood_pair_2_flair}
\oodfailureFLAIR{figures/ood-failure/pair_5_flair_frac0p50.png}{fig:ood_pair_5_flair}
\oodfailureFLAIR{figures/ood-failure/pair_11_flair_frac0p50.png}{fig:ood_pair_11_flair}
\oodfailureFLAIR{figures/ood-failure/pair_19_flair_frac0p50.png}{fig:ood_pair_19_flair}


