
In this section, we give an exhaustive overview of the hyperparameters used for training of the several nnU-Nets for the various datasets.
We make our code available at \href{https://github.com/MIAGroupUT/augmentations-for-the-unknown}{\url{https://github.com/MIAGroupUT/augmentations-for-the-unknown}}.

\paragraph{nnU-Net Hyperparameters}
We summarise our used hyperparameters for training all the nnU-Net architectures unless specified otherwise in Tab.~\ref{tab:nnunet_hyperparams}. 
\begin{table}[!tbh]
\centering
\caption{Default nnU-Net hyperparameters and modifications used in our experiments.}
\label{tab:nnunet_hyperparams}
\small
\begin{tabular}{lp{8cm}}
\toprule
\textbf{Category} & \textbf{Details} \\ \midrule
\textbf{Architecture} & \\
- Model Type & 2D U-Net (default); full-resolution 3D U-Net for brain MRI. \\
- Ensembles & Predictions ensembled across 5 models (five-fold cross-validation). \\ \midrule
\textbf{Training} & \\
- Optimizer & SGD with Nesterov momentum (momentum = 0.99). \\
- Learning Rate & Initial rate = 0.01, polynomial decay (power = 0.9). \\
- Regularization & Weight decay = 3e-5. \\
- Epochs & Maximum of 200 (modified). \\
- Batch Size & Adaptive to GPU memory (2-5 for 3D, larger for 2D). \\
- Loss Function & Combined Dice Loss and Cross-Entropy Loss. \\ \midrule
\textbf{Data Preprocessing} & \\
- Intensity Normalization & Z-score normalization (per channel). \\
- Resampling & Voxel spacing resampled to median value. \\
- Padding & Mirror padding applied. \\ \midrule
\textbf{Inference} & \\
- Test-Time Augmentation (TTA) & Mirroring along all axes. \\ \midrule
\textbf{Post-Processing} & \\
- Connected Component Analysis & Applied for class-specific refinement. \\ \bottomrule
\end{tabular}
\end{table}

\paragraph{Computational Cost and Convergence}

MixUp incurs no additional computational overhead, requiring 1× FLOPs and memory compared to the baseline. In contrast, AFA is slightly more expensive, with 2× FLOPs and 1.62× memory usage. This increase is due to the additional Fourier-based transformations and auxiliary losses, which are designed to enhance robustness without significantly impacting efficiency. Despite the higher computational cost of AFA, the convergence speed remains unaffected. All models, including AFA and MixUp, fully converge by 200 epochs. This demonstrates that the added complexity does not delay training, ensuring that the benefits of improved robustness and generalization are achieved without sacrificing training efficiency.

% \subsection*{Default nnU-Net Hyperparameters (with Modifications)}

% \begin{itemize}
%     \item \textbf{Architecture:}
%     \begin{itemize}
%         \item Model type: 2D U-Net by default, full\_res 3D U-Net for brain MRI data (as per "Additional Experiments")
%         \item Ensembles: all predictions are an ensemble of 5 models trained as part of five-fold cross validation.
%     \end{itemize}
    
%     \item \textbf{Training:}
%     \begin{itemize}
%         \item Optimizer: SGD with Nesterov momentum
%         \item Initial learning rate: 0.01
%         \item Momentum: 0.99
%         \item Weight decay: 3e-5
%         \item Epochs: Maximum of 200 (modified)
%         \item Batch size: Adaptive to GPU memory (usually 2-5 for 3D, larger for 2D)
%         \item Loss function: Combination of Dice Loss and Cross-Entropy Loss
%         \item Learning rate schedule: Polynomial decay (power=0.9)
%     \end{itemize}
    
%     \item \textbf{Data Preprocessing:}
%     \begin{itemize}
%         \item Intensity normalization: Z-score normalization (per channel)
%         \item Resampling: Resampling to median voxel spacing
%         \item Padding strategy: Mirror padding
%     \end{itemize}
    
%     \item \textbf{Inference:}
%     \begin{itemize}
%         \item Test-time augmentation (TTA): Mirroring along all axes
%     \end{itemize}
    
%     \item \textbf{Post-processing:}
%     \begin{itemize}
%         \item Connected component analysis: Applied for class-specific post-processing
%     \end{itemize}
% \end{itemize}
