\subsection{K-head DL-MODIR guidance}
Our preliminary experiments showed that mere training with of a loss term on the organ mask does not provide sufficient diversity in the component of additional guidance used in the final DIR solution. This is further described in the appendix X. Based on this, we modified the original architecture to add a segmentation head. The idea was to create a inductive bias in the encoder on the organ masks so that the information could be used in generation of the DVF. The schematic of the final approach is shown in Figure X.

\subsection{Weighted MO Training}
In DIR (like many other MO problems), one objective may be more preferred than other objectives. This means a portion of the approximation front is more desired than other regions. This would mean that we need more solutions in the desired region. To achieve this, we used the concept of weighted hypervolume \cite{zitzler2007hypervolume}, wherein, prior to the calculation of the hypervolume, the objective values are weighted by the following weighting function on each loss term:

Subsequently, weighted hypervolume is:
\begin{equation}
    wHV = HV(\frac{e^{-\lambda \mathfrak{L}}}{e^{-\lambda}} \mathfrak{L})
\end{equation}

By applying the product rule and chain rule, the gradients of the weighted hypervolume (needed for MO training) can be obtained in the following manner:

\begin{equation}
    \dd{wHV}{L^i_j} = (-\lambda  L^i_j  e^{-\lambda L^i_j} + e^{-\lambda L^i_j} )  \dd{HV}{L^{i*}_j}
\end{equation}
where,

\begin{equation}
    L^{i*}_j = \frac{e^{-\lambda L^i_j}}{e^{-\lambda}} L^i_j
\end{equation}
We trained our neural network with this approach and the resulting model is referred to as weighted DL-MODIR.

\subsubsection{Effect of weighted HV}
Weighted HV yields more solutions in the desired region of the approximation front.
\begin{figure*}
    \centering
    \includegraphics[width=\textwidth]{figures/weighted_mo_dir.png}
    \caption{2D projections of the approximation front from simple grid search (column 1), MO DIR (column 2), and weighted MO DIR (column 3). The individual points represent mean loss values from 10 test scans. The marker styles and colors vary according to NCCLoss value for easy comparison between two views shown in row 1 and row 2.}
    \label{fig:weighted modir}
\end{figure*}




% \begin{figure*}
%     \centering
%     \includegraphics[width=\textwidth]{figures/dice_vs_tre.png}
%     \caption{Segmentation mismatch (represented by 1 - dice coefficient) for each organ contour (Left) and target registration errors for each manually annotated landmark (Right) for a randomly selected test scan pair. The values corresponding to the different DIR solutions on the approximation front are shown by filled circles. The color of each circle correspond to percentage folding in the obtained deformation vector field. The values for the source image are shown with black lines. Due to the conflicting nature of the underlying objectives, it is difficult to select a single solution, which is better in all objectives and all regions.}
%     \label{fig:a posteriori decision making}
% \end{figure*}


HV calculation (and consequently HV maximizing gradients) is also known to be biased towards the region of the Pareto front where the ratio of loss in one objective and gain in another objective is close to 1. This is illustrated with experiments on ZDT5 in Figure X. We observed that the Pareto front for NCCLoss vs. Smooth loss also faces this issue. Unfortunately, in the given problem, the sparse region of the Pareto front is likely to be of more importance. We empirically observed that this issue can be mitigated with normalization of the objective space, as shown in Figure X.
However, a real-world problem such as DIR and added stochastic behavior because of learning pose additional challenges. Therefore, a naive normalization of objective space does not yield desired results.