
In this study, we have demonstrated how non-standard augmentation techniques that do not target specific variations, specifically MixUp and Auxiliary Fourier Augmentation (AFA), can enhance the robustness of state-of-the-art segmentation frameworks like nnU-Net against many variations in MRI. 

While MixUp has been known to be an effective augmentation for various tasks~\cite{eaton-rosen_improving_2018, thulasidasan_mixup_2019, gazda_mixup_2022}, we find it is very well suited for overcoming challenging medical image conditions as well. However, we also observe that without base augmentations on P158, MixUp alone leads to a (non-significant) decline in performance. Our results highlight the advantage of combining augmentation strategies that intrinsically exploit different mechanisms. For example, AFA follows a fundamentally different strategy than MixUp by directly perturbing $k$-space data. The effect of this is shown in our results, in which the combination of both always improves over their individual use. This is corroborated by an evaluation of the feature space using $k$-variance gradient-normalized margins. We consider this metric a promising tool for studying model generalizability.

Our results demonstrate that MixUp and AFA not only improve robustness to distribution shifts but also maintain comparable performance on the original, non-transformed dataset. This is a significant advantage, as many robustness techniques, such as adversarial training or aggressive noise injection, often introduce biases that degrade performance on standard tasks~\cite{tsipras_robustness_2019, hendrycks_using_2019, zhang_theoretically_2019, geirhos_shortcut_2020}. These biases arise because such methods overfit to the augmented or corrupted data. In contrast, our considered augmentations promote feature compactness and separability without disrupting the underlying data distribution, ensuring that performance on the original dataset remains consistent and comparable. This balance between robustness and accuracy is critical for clinical applications, where models must perform reliably across both clean and challenging data.

% these augmentations rely on inherent data variation to be effective, as using MixUp with base augmentations always results in significant improvements, and are not universally beneficial under all conditions. These limitations underscore the necessity of always testing even a general augmentation strategy. 


Our results align with studies incorporating general augmentation strategies into nnU-Net~\cite{atya_non_2021}, and MixUp and AFA are straightforward additions with a lot of benefits. However, augmentations cannot address all generalization gaps. We find that prostate zonal segmentation remains challenging due to significant inter-subject variability. One example is the effect of age, where younger cohorts have sharper tissue boundaries~\cite{allen_age-related_1989, situmorang_prostate_2012} and such differences are difficult to address without prior knowledge, regardless of augmentations. 

While the augmentations we treat in this work are valuable tools for improving robustness, there remains substantial potential for further advancements in this domain. For example, we have here used a base version of MixUp which has been previously shown to be sufficient~\cite{atya_non_2021}, but there are many variants~\cite{cao_survey_2024}. We further include discussion on CutMix in Appendix~\ref{app:further_exp} and how it pairs with other augmentations. However, most other variants would involve too many changes to the nnU-Net framework for limited benefits~\cite{liu_cut_2024} and some are superfluous to analyse as, for instance, nnU-Net uses deep supervision~\cite{shen_object_2020} and therefore the use of base MixUp is similar to Manifold-MixUp~\cite{verma_manifold_2019}. There are other data-agnostic augmentations as well which are out of scope of discussion, and would need to be adapted for medical domain, like PRIME~\cite{modas_prime_2022}, which considers only RGB color space which is characteristically different from multi-parametric MRI scans and is not implemented for 3D volumes. Therefore, our work can be viewed as a stepping stone toward broader research on using simple general augmentations for out-of-distribution generalization in medical imaging, as opposed to more complicated methods like model-based methods like GANs and diffusion models~\cite{garcea_data_2023}. 

In conclusion, we find that adding non-standard data-agnostic augmentation to a state-of-the-art nnU-Net model can consistently and significantly increase segmentation performance under various generalisation challenges, for cardiac cine MRI and prostate MRI. This could enhance the reliability of segmentation models under diverse and challenging conditions in clinical practice.

% , our approach reduces the need to retake images, saving valuable time and resources while minimizing patient discomfort. Robust models also enable more reliable automated pipelines, ensuring consistent performance across varying imaging protocols and scanner types. Furthermore, these improvements provide a stronger foundation for active learning frameworks, where robust models serve as better starting points for iterative refinement with limited annotated data. Overall, our work contributes to more efficient, accurate, and generalizable medical image analysis, ultimately supporting better diagnostic and treatment outcomes in real-world clinical workflows.}



% WIP

% In this study, we have shown how non-standard augmentation techniques, in this case, MixUp and AFA, can improve the robustness of a state-of-the-art segmentation model to a range of corruptions that might occur in MRI. Moreover, we have shown how this improves generalization to datasets from similar yet different distributions. We have shown corresponding results on both cardiac cine MRI and prostate MRI segmentation.

% Robustness and  generalization in MRI segmentation are important because the performance of deep learning methods under various image variations that might occur at the acquisition stage. Zonal segmentation of the prostate is challenging due to segmenting two adjoined regions with very large inter-subject variability.
% Also, older populations may present with degenerative changes, whereas younger cohorts often exhibit sharper tissue boundaries~\cite{allen_age-related_1989, situmorang_prostate_2012}.

% While MixUp and AFA are general augmentation strategies that do not target a specific corruption, there is a link between AFA and MR image reconstruction, since the augmentation is perturbing the $k$-space data directly, which is data that is used for MR image reconstructions, which leads to our hypothesis that AFA as an augmentation strategy might be an effective strategy to improve out-of-distribution generalisation performance for automatic image segmentation using deep neural networks.

% HERE I WOULD ALSO EXPECT A PARAGRAPH THAT PUTS THE RESULTS IN PERSPECTIVE WRT EXISTING LITERATURE, SUCH AS THE PAPER THAT WE FOUND THAT ADDS 'GENERAL' AUGMENTATION STRATEGIES TO NNU-NET. 

% the generalization of segmentation models to a range of 


% In conclusion, our results strongly recommend the use of MixUp, Auxiliary Fourier Augmentation (AFA), or a combination of both to enhance the robustness and generalization of medical image segmentation models. These techniques, when integrated with existing model architectures and training frameworks like nnU-Net, provide substantial improvements in handling data corruptions and out-of-distribution variations. 


% Given their simplicity and effectiveness, we encourage the adoption of these augmentations as a straightforward solution to improve model performance in real-world medical applications.


% \begin{figure}[!htb]
%     \centering
%     \includegraphics[width=0.5\linewidth]{figures/example_margin_plot.png}
%     \caption{WIP, example margin plot, TODO: add information and explanation}
%     \label{fig:enter-label}
% \end{figure}