
% is a high-stakes application 
Medical image analysis requires deep learning models that are accurate, robust, and generalize well to new and unseen data. However, when deployed in real-world scenarios, deep neural networks often suffer performance degradation~\cite{hendrycks_benchmarking_2019, kamann_benchmarking_2021}. This generalization gap can be attributed to a range of factors, including variations in patient populations, differences in image acquisition, and imaging artefacts. Among strategies to improve generalization are data augmentation techniques like intensity shifts, affine transforms, and noise addition~\cite{garcea_data_2023, goceri_medical_2023}.
% Common data augmentation strategies, such as affine transformations and intensity shifts, can capture well-defined variations in images, such as patient positioning and image acquisition settings~\cite{garcea_data_2023}.
As they can demonstrably improve out-of-distribution generalization~\cite{boone_rood-mri_2023}, they are among the standard set of augmentations used in many deep learning segmentation models, such as nnU-Net~\cite{isensee_nnu-net_2021}. 

However, standard augmentation strategies cannot cover more complex underlying image formation mechanisms, which in MRI could include bias fields due to coil miscalibration, Rician noise when MRI is taken at higher resolutions, ghosting artefacts, or random RF spikes during acquisition (see Fig.~\ref{fig:example_variations}). Artifact-specific augmentation policies~\cite{boone_rood-mri_2023} or pre-processing methods such as bias field correction  might mitigate this problem, but such processes are not guaranteed to exist, for instance, Rician noise, ghosting and unseen issues due to data mishandling~\cite{shimron_implicit_2022}, or be missed entirely in automated workflows. Furthermore, explicitly anticipating all possible variations is often infeasible.
Hence, as an alternative, augmentation strategies that are not specifically designed for any variation and yet manages to mitigate the effect of multiple variations would greatly benefit medical imaging with deep learning.
% This raises an important question: \textit{Are there augmentation strategies that can improve the generalisability and robustness of models without prior knowledge of possible variations?}

% In many cases, such corruption in scans is not a major problem, as scans can often be retaken after correcting the underlying issue. However, in some cases, this is not possible, such as when patients are uncooperative or when a contrast agent is used.

% Moreover, these augmentations can only minimally or not at all simulate a distribution shift induced by different machines, different protocols in hospitals, and variations in population due to different geological positions.
% \begin{figure}[tbp]
%     \centering
%     \includegraphics[width=0.75\linewidth]{figures/simple_pipeline.png}
%     \caption{We consider a simple model of acquiring an MRI image for a patient. In this simple pipeline, there are only three steps. Each of these steps can introduce a different source of variation, which can lead to out of distribution samples that models may fail to generalise for.}
%     \label{fig:simple-pipeline}
% \end{figure}

In this work, we systematically investigate general, \textit{data-agnostic} augmentation strategies, namely MixUp~\cite{zhang_MixUp_2018} and Auxiliary Fourier Augmentation (AFA)~\cite{vaish_fourier-basis_2024}. By data-agnostic, we mean augmentations that do not seek to maintain the visual consistency of the data being augmented. We demonstrate the effect of these techniques in nnU-Net models for segmentation of cardiac cine MRI and prostate MRI. Neither MixUp nor AFA explicitly addresses specific sources of variation in these data, yet we show how they improve segmentation performance in various out-of-distribution generalization settings. Moreover, we include an analysis of the learned feature representations, showing improved structure and interoperability when MixUp and AFA are used. 
% While models may perform equally well on common segmentation metrics, it is also important for generalisability that the learned features are structured and informative..
% Therefore, we consider it valuable that we analyse the changes in the learnt feature representation to gain further insights into the effects of these augmentation strategies.
% , which are prone to variability due to factors such as imaging conditions and patient-specific characteristics. 
% without directly addressing or reproducing specific sources of variation.
% These methods were chosen due to their simplicity of being added to the training method, being data agnostic, and they challenge the traditional paradigm of augmenting data in a visually consistent way.
Our findings provide new insights into the effectiveness and limitations of these augmentation methods in medical image analysis scenarios and show that MixUp and AFA can improve the performance of deep neural networks in multiple tasks and generalization settings. % . % Our contributions are as follows. 
% We 

% settings. 
% \begin{enumerate}[noitemsep]
%     \item We assess and compare the effectiveness of conventional augmentations, MixUp, and AFA in an nnU-Net framework exposed to distribution shifts.
%     % \item We analyze the performance degradation of deep neural networks in MRI segmentation tasks in the presence of out-of-distribution samples. 
%     % Our experiments show that these augmentations, which do not explicitly model distribution shifts or corruptions, substantially enhance out-of-distribution generalisation performance for medical segmentation tasks.
%     \item We show that MixUp and AFA can improve the performance of nnU-Nets in multiple tasks and generalization settings. 
%     \item We demonstrate that MixUp and AFA lead to intra-class compact and inter-class separable features, a requirement for generalization.
% \end{enumerate}
% First, we assess and compare the effectiveness of conventional augmentations, MixUp, and AFA in an nnU-Net framework exposed to distribution shifts. Second, Third, we demonstrate that MixUp and AFA lead to intra-class compact and inter-class separable features, a requirement for generalization.


% In this work, we evaluate several options for such an augmentation strategy and provide new insights into the current capability and shortcomings of augmentation techniques. Specifically, we focus on the segmentation of  cardiac and prostate MRI images. MRI imaging is susceptible to numerous sources of variation, during~\cite{CITE} and after~\cite{CITE} image acquisition and due to patient-specific factors~\cite{CITE}. We show how the augmentation techniques  MixUp~\cite{CITE} and Auxiliary Fourier Augmentation (AFA)~\cite{CITE} can improve generalization and robustness while not explicitly correcting for any expected variation. 
% Such strategies are a departure from the traditional augmentation techniques. Instead of seeking to simulate the corruptions and transformations of a known set of corruptions, these methods focus on regularising the deep neural network that not only leads to better robustness in presence of imaging artefacts but also promotes enhanced generalisation capabilities.


% consider the issue of generalisation and robustness that persists in medical image analysis, specifically taking MR image analysis and the task of segmentation as our focus, and provide some new insights into the current capability and shortcomings of augmentation techniques.


% \item We study the performance degradation of deep neural networks due to variations in MR images for MRI segmentation tasks for the hippocampus, heart, and prostate. We create a simple breakdown for different sources of variations that might negatively affect model performance.
% These regions were chosen due to their clinical significance and unique challenges each poses, such as variability in anatomical structure, acquisition protocols, and patient-specific factors. We break down various sources of variation that can negatively impact model performance, providing a clear understanding of their effects on segmentation accuracy.
% This analysis, conducted on heart and prostate MRI datasets, provides insights into how these augmentations can enhance model stability and generalization across diverse patient populations and unexpected imaging variations.
% , variations due to demographics, variations at the time of image acquisition, and variations in policies of handling data after acquiring the image.
% \item Showcase effectiveness of non-traditional image augmentation techniques that go beyond simulating the expected variations in real world scenario, even for deep neural networks in medical image analysis.