\section{Discussion}
\label{sec:discussion}

\paragraph{Modern Deep Learning designs show improved Task Understanding on Familiar Modalities}
Deformable Image Registration is a highly spatially non local task that requires both global and local context of both the fixed and moving images.
Since the dominant formulation of the task is an inverse variational optimization problem, most dominant approaches optimize the warp field directly using various parameteric and non-parametric optimization methods.
Early deep learning methods for registration used standard convolutional designs to pose the inverse problem into that of a prediction task.
However, these methods suffered from poor generalization to out-of-distribution data, and were unable to register images with different modalities, resolutions, or anatomies.
Modern methods borrow design elements from iterative optimization methods ~\citep{jian2024mamba,bailiang} to improve generalization to out-of-distribution data.
These designs have been shown to be highly effective at improving generalization to domain shift on in-distribution contrasts, and even generalize to human-adjacent species like the Macaque brain, showing that these designs learn task-aware representations.
However, these methods still slightly underperform iterative optimization methods on in-distribution data while consuming an order of magnitude more computational resources at inference time ~\citep{fireants}.
This positions iterative methods as significantly more resource efficient, while still being able to achieve competitive performance, making it a suitable choice for practical applications and deployment on edge devices, and highlighting that there is still room for improvement in efficient designs of deep learning methods for registration.


\paragraph{Deep Learning methods do not generalize to out-of-distribution data}
Contrary to the claims made in the LUMIR challenge, and in accordance with the well-established literature on domain shift, deep learning methods do not generalize to out-of-distribution data despite robust performance on in-distribution data.
A segmentation and parcellation algorithm like SLANT generates ROIs that are not always well-defined in terms of intensity boundaries, and are often biased by the internal representation of the model, which could lead to spurious results for intensity-based registration algorithms. 
To alleviate this potential pitfall, we use SynthSeg to generate high-quality labelmaps for the T2, T2*, and FLAIR  modalities on the NIMH dataset, which generates labelmaps whose fidelity is derived from the image itself.
Our evaluation shows that the performance gap between optimization and deep learning methods is significantly higher than that on the T1 modality, showing that out of distribution generalization still remains a challenge for deep learning methods.
Our data preprocessing and evaluation protocol is made publicly available in our code repository for reproducibility and transparency.
Since these results differ markedly from the claims made in the LUMIR challenge, we advocate for evaluation protocols that reflect practical clinical and research workflows rather than conditions that may inadvertently favor particular method classes.

\paragraph{Deep Learning methods are sensitive to preprocessing choices}
A key overlooked aspect of registration challenges in general is the choice of preprocessing steps.
Registration challenges typically standardize the data into a common orientation, resolution, and voxel sizes to provide a controlled evaluation environment.
However, real-world data including histology, blockface images, \textit{ex-vivo} hemispheres are rarely aligned to any stereotaxic coordinates or have standard resolutions.
Registration algorithms must therefore be able to handle a wide range of preprocessing steps, including stereotaxic coordinates, orientations and voxel sizes.
Our experiments show that a model trained on images with $192\times160\times224$ voxels fails catastrophically on the NIMH dataset with the original $208\times256\times256$ voxels, and cropping extra padding significantly improves performance.
Other forms of preprocessing might negatively impact performance and might be hard to debug.

\paragraph{Code availability}
All our preprocessing and evaluation scripts are made publicly available in our code repository (\href{https://github.com/rohitrango/lumirage-evals}{https://github.com/rohitrango/lumirage-evals}) for reproducibility and transparency.