\PassOptionsToPackage{table,xcdraw,dvipsnames}{xcolor}
\documentclass{midl} % Include author names

% The following packages will be automatically loaded:
% jmlr, amsmath, amssymb, natbib, graphicx, url, algorithm2e
% ifoddpage, relsize and probably more
% make sure they are installed with your latex distribution

% \usepackage{mwe} % to get dummy images
\usepackage{mathrsfs}
\usepackage{multicol}
\usepackage{multirow}
\usepackage{enumitem}
\usepackage{array}
% \usepackage{caption}
% \usepackage{subcaption}
\usepackage{booktabs} % For professional-looking tables
\usepackage{makecell} % For multi-line headers
\usepackage{pgfplots}
\pgfplotsset{compat=1.17}
\usepgfplotslibrary{groupplots}
\usepackage{pgfplotstable}

% Define the teal color
\definecolor{newtextcolor}{rgb}{0.0, 0.5, 0.5}

% Define the newtext wrapper
\newcommand{\newtext}[1]{%
    {\color{newtextcolor}#1}%
}

\newcommand{\blue}[1]{%
    {\color[rgb]{.5,.5,1}#1}%
}

\newcommand{\red}[1]{%
    {\color[rgb]{1,.5,.5}#1}%
}


\makeatletter
\newenvironment{customlegend}[1][]{%
    \begingroup
    % inits/clears the lists (which might be populated from previous
    % axes):
    \pgfplots@init@cleared@structures
    \pgfplotsset{#1}%
}{%
    % draws the legend:
    \pgfplots@createlegend
    \endgroup
}%

% Multi-line left-aligned text with manual line breaks.
% The base line is in centre.
\newcommand*{\mline}[1]{%
\begingroup
    \renewcommand*{\arraystretch}{1.1}%
   \begin{tabular}[c]{@{}>{\raggedright\arraybackslash}p{2cm}@{}}#1\end{tabular}%
  \endgroup
}
% makes \addlegendimage available (typically only available within an
% axis environment):
\def\addlegendimage{\pgfplots@addlegendimage}

\jmlrvolume{-- 93}
\jmlryear{2025}
\jmlrworkshop{Full Paper - MIDL 2025}
\editors{Accepted for publication at MIDL 2025}

% \title[Beyond Traditional Augmentations for MR Image Segmentation]{In Search for Augmentations for MR Image Segmentation: \\ Beyond Traditional Augmentation Strategies}
\title[Augmentations for the Unknown]{Data-Agnostic Augmentations for Unknown Variations: \\ Out-of-Distribution Generalisation in MRI Segmentation}
 % Use \Name{Author Name} to specify the name.
 % If the surname contains spaces, enclose the surname
 % in braces, e.g. \Name{John {Smith Jones}} similarly
 % if the name has a "von" part, e.g \Name{Jane {de Winter}}.
 % If the first letter in the forenames is a diacritic
 % enclose the diacritic in braces, e.g. \Name{{\'E}louise Smith}

 % Two authors with the same address
 % \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\and
 %  \Name{Author Name2} \Email{xyz@sample.edu}\\
 %  \addr Address}

 % Three or more authors with the same address:
 % \midlauthor{\Name{Author Name1} \Email{an1@sample.edu}\\
 %  \Name{Author Name2} \Email{an2@sample.edu}\\
 %  \Name{Author Name3} \Email{an3@sample.edu}\\
 %  \addr Address}


% Authors with different addresses:
% \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\\
% \addr Address 1
% \AND
% \Name{Author Name2} \Email{xyz@sample.edu}\\
% \addr Address 2
% }

%\footnotetext[1]{Contributed equally}

% More complicate cases, e.g. with dual affiliations and joint authorship
\midlauthor{\Name{Puru Vaish\midljointauthortext{Corresponding author}\nametag{$^{1}$}} \Email{p.vaish@utwente.nl}\\
\Name{Felix Meister\nametag{$^{2}$}} \Email{felix.meister@siemens-healthineers.com}\\
\Name{Tobias Heimann\nametag{$^{2}$}} \Email{tobias.heimann@siemens-healthineers.com}\\
\Name{Christoph Brune\nametag{$^{1}$}} \Email{c.brune@utwente.nl}\\
\Name{Jelmer M. Wolterink\nametag{$^{1}$}} \Email{j.m.wolterink@utwente.nl}\\
\addr $^{1}$ Department of Applied Mathematics, Technical Medical Centre, University of Twente \\
\addr $^{2}$ Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany
}

\begin{document}

\maketitle

\begin{abstract}
Medical image segmentation models are often trained on curated datasets, leading to performance degradation when deployed in real-world clinical settings due to mismatches between training and test distributions. While data augmentation techniques are widely used to address these challenges, traditional visually consistent augmentation strategies lack the robustness needed for diverse real-world scenarios. In this work, we systematically evaluate alternative augmentation strategies, focusing on MixUp and Auxiliary Fourier Augmentation. These methods mitigate the effects of multiple variations without explicitly targeting specific sources of distribution shifts. We demonstrate how these techniques significantly improve out-of-distribution generalization and robustness to imaging variations across a wide range of transformations in cardiac cine MRI and prostate MRI segmentation. We quantitatively find that these augmentation methods enhance learned feature representations by promoting separability and compactness. Additionally, we highlight how their integration into nnU-Net training pipelines provides an easy-to-implement, effective solution for enhancing the reliability of medical segmentation models in real-world applications.
% Image segmentation models are often trained with highly curated datasets and suffer performance degradation when deployed in real-world clinical settings due to mismatches between training and test distributions. Data augmentation techniques are commonly used to mitigate these issues, but traditional augmentations may not provide sufficient robustness. In this work, we explore the effectiveness of two established augmentation techniques, i.e., MixUp and Auxiliary Fourier Augmentation, to improve model regularization and robustness against a wide range of transformations. We show that these augmentation strategies significantly improve the generalization of medical segmentation models under various distribution shifts. Moreover, we show how these augmentation strategies improve the performance of nnU-Net models, offering an easy-to-implement solution for improving model reliability in real-world medical applications.
\end{abstract}

\begin{keywords}
MRI, segmentation, data augmentation, generalisation, robustness
\end{keywords}

\section{Introduction}\label{sec:intro}
\input{sections/introduction}

% \section{Related Works}\label{sec:rel}
% \input{sections/related_works}

% \section{Preliminary}\label{sec:prelim}
% \input{sections/preliminary}

\section{Materials and Methods}\label{sec:method}
\input{sections/methodology}

\section{Experiments and Results}\label{sec:results}
All hyperparamters, program code and implementation details can be found in Appendix~\ref{app:hyp_rep}.
\input{sections/results}

\section{Discussion and Conclusion}\label{sec:conc}
\input{sections/conclusion}

\clearpage  % Acknowledgements, references, and appendix do not count toward the page limit (if any)
% Acknowledgments---Will not appear in anonymized version
\midlacknowledgments{
This publication is part of the project ROBUST: Trustworthy AI-based Systems for Sustainable Growth with project number KICH3.LTP.20.006, which is (partly) financed by the Dutch Research Council (NWO), Siemens Healthineers, and the Dutch Ministry of Economic Affairs and Climate Policy (EZK) under the program LTP KIC 2020-2023.}

\bibliography{midl25_93}

\clearpage
\appendix

\section{More Examples of Variations}
\input{sections/appendix/image_variations}

\section{Example Images of Augmentation}
In this section we show some examples of images as a result of applying the augmentation strategies. In Fig.~\ref{fig:afa_aug} we show an example of AFA augmentation and in Fig.~\ref{fig:mixup_aug} we show the result of a mixup augmentation on both the image and the segmentation masks.
% \begin{figure}[!htb]
%     \centering
%     \includegraphics[width=0.8\textwidth]{figures/original_images.png}
%     \caption{These are the original images from the ACDC dataset.}}
% \end{figure}

\begin{figure}[!htb]
    \centering
    \includegraphics[width=\textwidth]{figures/afa_augmentation.png}
    \caption{These images have been augmented using AFA. The resultant image has varying degree and ampltide of planar waves if done on a 2D slice. The labels are left unaffeted.}
    \label{fig:afa_aug}
\end{figure}

\begin{figure}[!htb]
    \centering
    \includegraphics[width=\textwidth]{figures/mixup_augmentation.png}
    \caption{This image shows the effects of a mixup augmentation on both the images and the labels. The first row shows examples of samples that have mixup applied to them. Row 2 and Row 3 are the ground truth labels for myocardium of the samples before being mixed up. We omit showing the other classes for visualisation. In the last row we show how the effect on the class and how the segmentation masks are combined to produce a probaility mask which is then used as a ground truth during loss calculation.}
    \label{fig:mixup_aug}
\end{figure}


\section{Hyperparameters and Reproducibility}\label{app:hyp_rep}
\input{sections/appendix/hyperparamters}


\section{Performance per Severity for All Corruptions}
\input{sections/appendix/all_corruptions_results}%

\clearpage

\section{Latent Space Representations across Initialisations}\label{app:more_pca}
Here we repeat the PCA projection of the learned features for the final features from nnU-Net trained with different augmentation techniques across different runs. We take the model trained in each fold of our 5-cross validation training process for this purpose. The latent space representation plots for ACDC are shown in Fig.~\ref{fig:emb_vis_folds_acdc} and for P158 in Fig.~\ref{fig:emb_vis_folds_p158}. Both from qualitative and quantitative perspective these repetitions show consistently similar learnt feature representation under images with transformations.

\begin{figure}[!tbh]
\definecolor{backg}{RGB}{42, 179, 205}
\definecolor{myo}{RGB}{250, 231, 35}
\definecolor{leftv}{RGB}{253, 127, 116}
\definecolor{rightv}{RGB}{150, 55, 173}

    \centering
    \includegraphics[width=\textwidth]{figures/esoteric/pca/acdc/numerous_runs_acdc.png}
\begin{tikzpicture}
    \begin{customlegend}[legend columns=-1,legend style={draw=none,column sep=1ex},legend entries={\small Background, \small Left Ventricle, \small Myocardium, \small Right Ventricle}]
    \addlegendimage{only marks,mark=*,color=backg,fill}
    \addlegendimage{only marks,mark=*,color=leftv,fill}
    \addlegendimage{only marks,mark=*,color=myo,fill}
    \addlegendimage{only marks,mark=*,color=rightv,fill}
    \end{customlegend}
\end{tikzpicture}%

    \caption{We perform more iterations of our analysis of the learnt feature representation for ACDC dataset over different folds of our five-fold cross validation training. The one wirtten in the paper is from fold 0, and so here we show the rest 4 fold of the models trained with image augmentations: none, only base augmentation and in combination with MixUp or AFA or both.}
    \label{fig:emb_vis_folds_acdc}
\end{figure}

\begin{figure}[!tbh]
\definecolor{backg}{RGB}{42, 179, 205}
\definecolor{myo}{RGB}{250, 231, 35}
\definecolor{leftv}{RGB}{253, 127, 116}
\definecolor{rightv}{RGB}{150, 55, 173}

    \centering
    \includegraphics[width=\textwidth]{figures/esoteric/pca/p158/numerous_runs_p158.png}
\begin{tikzpicture}
    \begin{customlegend}[legend columns=-1,legend style={draw=none,column sep=1ex},legend entries={\small Background, \small Transition Zone, \small Peripheral Zone}
    % \small Right Ventricle}
    ]
    \addlegendimage{only marks,mark=*,color=backg,fill}
    \addlegendimage{only marks,mark=*,color=leftv,fill}
    \addlegendimage{only marks,mark=*,color=myo,fill}
    % \addlegendimage{only marks,mark=*,color=rightv,fill}
    \end{customlegend}
\end{tikzpicture}%

    \caption{We perform more iterations of our analysis of the learnt feature representation for P158 dataset over different folds of our five-fold cross validation training. The one wirtten in the paper is from fold 0, and so here we show the rest 4 fold of the models trained with image augmentations: none, only base augmentation and in combination with MixUp or AFA or both.}
    \label{fig:emb_vis_folds_p158}
\end{figure}


% \section{Qualitative Segmentation Performance}
% \input{sections/appendix/quali_segmentation_map}

\section{Evidence of Regularisation}
\input{sections/appendix/regularisation}

\section{Additional Experiments}\label{app:further_exp}
We have conducted additional experiments with CutMix~\cite{yun_cutmix_2019} and one
additional dataset for Brain MRI using the Alzheimer's Disease Neuroimaging Initiative Hippocampus Segmentation (ADNI) dataset~\cite{frisoni_eadc-adni_2015} and an additional OOD distribution dataset Hippocampus Segmentation (HFH)~\cite{jafari-khouzani_dataset_2011} to
evaluate the robustness of our augmentations to image transformations.
We present these preliminary results for robustness to image variation in Tab.~\ref{tab:cutmix_adni_acquisition} and results for real world out-of-distribution in Tab.~\ref{tab:cutmix_adni_real_world_ds} and include the results from the manuscript again for completeness.

\begin{table}[!htb]
    \caption{DSC and HD95 on the original and transformed test set of ACDC, P158 and ADNI using either using no augmentations or a combination of base, MixUp, CutMix, and AFA augmentations.}
    \label{tab:cutmix_adni_acquisition}
    \input{tables/new_table_avg}
\end{table}

\begin{table}[!htb]
    \centering
    \caption{DSC and HD95 performance under distribution shift for Cardiac Cine MR, testing on M\&Ms, Prostate bpMRI, testing on PX, and Brain MR, testing on HFH, with various data augmentation strategies. HFH does not have a public leaderboard for best model performance.}
    \label{tab:cutmix_adni_real_world_ds}
    \input{tables/new_table_ds_avg}
\end{table}

We found that CutMix provides better generalisation performance both on the original and the real world OOD test set. However, we found that the performance of CutMix is significantly worse on the test set with various image transformations compared to MixUp and Auxiliary Fourier Augmentation. This is
expected as CutMix, by design, learns better local and global feature representation by replacing patches
and volumes within an MRI scan. However, it fails to produce diverse samples like MixUp and Auxiliary Fourier
Augmentation, and therefore does not regularise the model for image transformations.
However, the combination of MixUp, CutMix and AFA provides the best of both worlds and provides the out-of-distribution
generalisation to image variations and real world datasets.

Most notably, the large real world OOD generalisation gap Prostate bpMRI is
significantly reduced (from DSC of 0.737 to 0.772). This is now only 5.5\% lower than training on the ProstateX dataset
directly as opposed to the 8.9\% lower when using MixUp with base augmentations only.


\clearpage

\section{Structure Wise Results and Standard Deviation of Metrics}\label{app:struct}
\input{sections/appendix/standard_deviations}

\end{document}
