\section{Methodology}

\subsection{Confidence Predictor}
We adapt the ConfidNet architecture \cite{confid-net} to the task of per-image confidence prediction for medical image segmentation. Specifically, we attach a confidence predictor $C_\phi$ at the penultimate resolution level of a U\!-Net $f_\theta$ and train $C_\phi$ to predict $f_\theta$'s true confidence score \rev{$g$}. To illustrate that $C_\phi$ can be trained to predict both overlap- and boundary-based \rev{confidence scores}, our experiments include volumetric and surface dice \cite{MaierHein2024} \rev{as choices of g}. In our main experiments, $f_\theta$ is frozen, so that the original segmentation network remains intact, and $C_\phi$ can re-use the results of its forward pass. In an ablation study, we demonstrate that fine-tuning a copy of $f_\theta$ for the purpose of confidence prediction, as in the ConfidNet, slightly improves confidence prediction further, at an increased computational cost. Details of our architecture are given in Section~\ref{sec:implementation}.

\subsection{Learning the Effects of Adversarial Perturbations}

After pre-training the confidence predictor $C_\phi$ for 100~batches on the same data, and with the same (non-adversarial) augmentations that were used to train $f_\theta$, we start adding adversarial perturbations, as detailed in Algorithm~\ref{alg:adv-training}. Our perturbations are based on the negative gradient of the predicted confidence score with respect to the input image, i.e., they represent a change to the image that $C_\phi$ expects to decrease segmentation quality. By processing these adversarial examples with $f_\theta$, and supervising the training of $C_\phi$ with the resulting confidence scores, we establish a feedback loop in which $C_\phi$ learns which deviations from the original training distribution actually affect segmentation quality.

\begin{algorithm}[t]
    \caption{Adversarial Perturbation Scheme}
    \begin{algorithmic}[1]
        \STATE \textbf{Notation}:
        \STATE \quad U\!-Net $f_\theta$, penultimate resolution level features $z_\theta$
        \STATE \quad Confidence predictor $C_\phi$, true confidence $g: \mathbb{R}^n \times \mathbb{R} ^n \rightarrow [0, 1]$ %$g:$ %\mathbb{R^n}
        \STATE \quad \rev{Loss function $\mathcal{L}: [0, 1] \times [0, 1] \rightarrow \mathbb{R}$}
        \STATE Initialize $\theta, \phi, \;\; \rev{\alpha \gets 0.8}, \;\; \eta \gets 0.01$
        \STATE Initialize $\text{ADV\_BUFFER}[0] \gets (x_0, y_0) \in \mathcal{D}$
        
        \WHILE{not converged}
            \STATE \textbf{1. Form a batch of size $2B$:}
            \STATE \quad New input $(x_i, y_i) \in \mathcal{D}$ as clean half $\mathcal{C}$. 
            \STATE \quad $(x'_{i-1}, y_{i-1}) \gets \text{ADV\_BUFFER}[i]$ as adversarial half $\mathcal{A}$.
        
            \STATE \textbf{2. Forward Pass:}
            \FORALL{$(x, y) \in \mathcal{C} \cup \mathcal{A}$}
                \STATE $\hat{y} \gets f_\theta(x), \;\; z \gets z_\theta(x)$ 
                \STATE $\hat{s} \gets C_\phi(z), \;\; s \gets g(\hat{y}, y)$ 
            \ENDFOR
            \STATE \quad $\displaystyle \text{loss} = \sum_{x \in \mathcal{C} \cup \mathcal{A}}{\mathcal{L}\bigl(\hat{s},\,s\bigr)}$
            \STATE \quad Update $\phi$ by backpropagating $\nabla_\phi (\text{loss})$
        
            \STATE \textbf{3. Compute Next Iteration's Adversarial Perturbations:}
            \FORALL{$(x, y) \in \mathcal{C}$}
                \STATE $\displaystyle \nabla_x \;\gets \frac{\partial\,C_\phi\!\bigl(z_\theta(x)\bigr)}{\partial x}$
                \STATE $\epsilon_{\Delta_s} \gets C_\phi\!\bigl(z\bigr) - C_\phi\!\bigl(z'\bigr) - \Delta_s$, \;\; $\alpha \gets \alpha - \eta \cdot \epsilon_{\Delta_s}$
                \STATE $\displaystyle \delta \; \gets -\,\alpha \,\frac{\nabla_x}{\|\nabla_x\|^2 + \epsilon}, \;\; x' \gets x + \delta$
            \ENDFOR
            \STATE $\text{ADV\_BUFFER}[i+1] \gets (x'_i, y_i)$
        \ENDWHILE
    \end{algorithmic}
    \label{alg:adv-training}
\end{algorithm}

\begin{figure}[t]
    \centering
    \includegraphics[width=0.48\textwidth]{figures/delta_scores.png}
    \includegraphics[width=0.48\textwidth]{figures/gradient_factors.png}
    \caption{Left: Effects of adversarial perturbations on segmentation network $f_\theta$ and confidence predictor $C_\phi$. Over time, the predictor learns to generate perturbations that actually affect the segmentation. Right: The gradient factor $\alpha$ evolves differently over five runs, illustrating the need to adapt it during training.}
    \label{fig:delta}
\end{figure}

Adversarial perturbations are computed in lines~18--22: In line~21, each image $x$ is modified according to a single gradient step with factor $\alpha$, divided by the squared gradient norm. In line~20, $\alpha$ is automatically adjusted to reduce the predicted confidence, on average, by a pre-specified amount $\Delta_s$, whose choice is discussed in Section~\ref{sec:learning-strength}. Division by the squared gradient norm is motivated by a first-order Taylor expansion of $C_\phi$, where it leads to a constant change in value. A small positive $\epsilon$ guarantees numerical stability.

Figure \ref{fig:delta} (left) illustrates our training process for five runs with $\Delta_s=0.1$. $C_\phi(z) - C_\phi(z')$ is the predicted difference in segmentation quality, based on activations $z$ from the original image and $z'$ from the perturbed one. Adjusting $\alpha$ makes it approximate the desired value $\Delta_s=0.1$. Interestingly, the actual difference $g(f_\theta(x), y)-g(f_\theta(x'), y)$ between segmentations of the original image $x$ and perturbed image $x'$, as rated by the quality metric $g$ with respect to the ground truth $y$, is very low initially, indicating that the predictor does not yet manage to create effective adversarial perturbations. This shows that it lacks an understanding of which deviations from the input distribution lead to a deterioration of segmentation quality. After a few dozen iterations, our feedback loop successfully aligns the predictor with the actual behavior of the segmentation network. \rev{This alignment is further illustrated in Figure~\ref{fig:grad-vis}, which shows examples of perturbations created with or without adversarial training along with their actual effects on the segmentation.}

Much of the remaining code in Algorithm~\ref{alg:adv-training} is devoted to an efficient implementation of our training scheme, which saves computation by using each image twice, once with, once without adversarial perturbation. This allows us to update the predictor for the current batch and, in the same forward pass, generate adversarial perturbations which are cached in a buffer to be included in the next batch, leading to a 50:50 ratio of original and perturbed images in each batch. For computing weight updates and image perturbations, we can retain the same computation graph to reduce redundant operations and end up with a minimal overhead that integrates well into modern automatic differentiation frameworks.

\subsection{Learning the Strength of Adversarial Perturbations} \label{sec:learning-strength}
We introduce a hyperparameter $\Delta_s$ to control the effect of adversarial perturbations on the predictor, thereby making the process both interpretable and consistent during training. We run an ablation study for $\Delta_s \in \{0.05, 0.1, 0.2\}$ (see Table \ref{tab:ablation}) and find that the framework is rather robust for these settings. For simplicity, we choose $\Delta_s = 0.1$ for all experiments.

Control over the effect is realized by continuously updating $\alpha$ during adversarial perturbation steps. For a sufficiently small $\alpha$, we assume the effect on $C_\phi$ is monotonic and thus use a simple update rule described in line 20 in Algorithm~\ref{alg:adv-training}. 
For five runs with similar hyperparameters, we report $\alpha$-values over time in Figure~\ref{fig:delta}~(right). After some iterations, the error in offset $e_{\Delta_s}(z, z')$ stabilizes close to zero, while the gradient factor $\alpha$ continues to evolve over time, illustrating the need to continuously adapt this factor during training.

\subsection{Implementation Details}
\label{sec:implementation}
Since the original ConfidNet \cite{confid-net} cannot be used for image-level confidence prediction from segmentation features, we propose a simple, wide but shallow, model: Two $3\times 3$ convolution blocks reduce the number of channels from 64 to 8, followed by two fully connected layers which reduce the remaining  feature dimension ($64^2 \times 8$ for M\&M and $96^2 \times 8$ for PMRI images) via a hidden dimension of 128 to confidence scores. We train score predictors by attaching them to the penultimate resolution level of a U-Net, which is frozen during training. We use the MSE loss \rev{$\mathcal{L}$} between $\hat{s}$ and $s$, the Adam optimizer with a learning rate of $10^{-5}$ and default parameters, and a batch size of 32 that is effectively doubled in step 7 of Algorithm \ref{alg:adv-training}. We train all score predictors for at most \rev{100 epochs with 100 batches each} and terminate if the validation loss stops improving\rev{, with a patience of 20 epochs, retaining the checkpoint from 20 epochs earlier if no improvement is seen. We select volumetric and surface dice as confidence scores $g$ and train a separate predictor for each of them. In multi-class settings, we predict class-wise scores and aggregate later.}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../submission"
%%% End:
