\section{Introduction}
Algorithms for medical image segmentation always have remaining failure cases, making quality control mandatory \cite{Fournel2021}. Training a neural network that directly predicts segmentation confidence is a simple and computationally efficient solution, and can be adapted to various metrics, differentiable or not. Domain shifts, arising from differences in imaging hardware, patient populations, or acquisition protocols, often derail segmentation models \cite{Guan2022}. Unfortunately, in this situation, where error detection is needed the most, direct confidence prediction is the least reliable: Not only does it have to deal with inputs that differ from those it has been trained on; when segmentations degrade markedly, it also needs to extrapolate beyond the training range of its outputs.

In this work, we address these challenges and substantially increase the robustness of direct confidence prediction for medical image segmentation, with the goal of turning this simple and efficient strategy into a practicable solution. We achieve this with a novel approach to training such predictors, augmenting the training data with adversarial examples that are outside of the original distribution and lead to lower segmentation quality. These adversarial examples are derived from the predictor itself, so that including them in the training along with their true effects establishes a feedback loop in which the predictor learns which perturbations actually affect segmentation quality. We demonstrate that this greatly improves confidence prediction under scanner changes in two real-world cardiac and prostate MRI datasets. Our approach does not require any modification of the underlying segmentation network, and can be adapted to predict different quality metrics.

\rev{Our proposed strategy differs substantially from established adversarial training approaches \cite{Bai:2021} in both its goal and implementation.  While typical methods focus on increasing robustness against adversarial attacks, our aim is to improve generalization to new domains. This shift necessitates extrapolation beyond the original output distributions—a requirement in our context due to lower segmentation accuracy in new domains compared to the training data. Consequently, we propose a method to generate adversarial examples that reduce segmentation accuracy by a pre-specified amount. Additionally, our approach aligns two networks - one for segmentation and one for confidence prediction - unlike standard adversarial training.}


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../submission"
%%% End:
 