\section{Approach}
\label{sec:approach}
We first provide a brief background on the concepts of MO optimization that we apply to deep learning based DIR. MO optimization refers to minimizing\footnote{In this paper, we assume minimization as objectives correspond to losses in deep learning.} a vector of $n$ objectives simultaneously. The goal is to find a set (often referred to as `approximation set') of $p$ solutions that are both close to as well as diversely-spread along the Pareto front -- the set of all Pareto optimal solutions in objective space. A solution is Pareto optimal if none of the objectives can be improved without a simultaneous detriment in performance in at least one of the other objectives \cite{van2000multiobjective}.

\begin{figure}
    \centering
    \includegraphics[width=0.65\textwidth]{figures/DL-MODIR.png}
    \caption{Illustration of the proposed deep learning based MO DIR approach. $I_{source}$: source image, $I_{target}$: target image, $Seg_{source}$ and $Seg_{target}$: organ segmentation masks for source and target image, respectively. The weights of the encoder are shared among $p$ DIR networks, which output $p$ DVFs ($\Delta_1$, $\Delta_2$, ..., $\Delta_p$) to warp $I_{source}$ and $Seg_{source}$. The network is trained to simultaneously minimize $p$ loss vectors $[L_{ImageSimilarity}, L_{DVFSmoothness}, L_{SegSimilarity}]$ using MO learning.}
    \label{fig:approach}
    \vspace{-5mm}
\end{figure}

Our deep learning based MO DIR implementation consists of a DIR network within the MO learning framework proposed in \citet{deist2023multi}. We selected VoxelMorph \cite{voxelmorph} for DIR because it is a well-known neural network for DIR. VoxelMorph uses an encoder-decoder style neural network for predicting a DVF, which is a basis for many deep learning based DIR approaches proposed afterwards. We selected the MO learning framework proposed in \citet{deist2023multi} for two reasons: a) it achieves MO training of neural networks through hypervolume (HV) maximization - a process that inherently ensures Pareto optimality\footnote{If HV is maximal, all the solutions are Pareto optimal.} and diversity between the solutions, b) it is the only MO approach that allows training neural networks multi-objectively without a priori knowledge of the exact preference between different objectives. It should be noted that the latter is crucial in the task of DIR. This is because earlier literature suggests that the exact preference between different objectives may be different between different image pairs, which may only be known a posteriori after inspecting multiple solutions \cite{pirpinia2017feasibility}.

In this paper, we aim to minimize $p$ loss vectors (corresponding to $p$ solutions or DIR outputs in the approximation set), each comprising of three losses: $L_{ImageSimilarity}$, $L_{DVFSmoothness}$, and $L_{SegSimilarity}$. Here, for $L_{ImageSimilarity}$, we used normalized cross-correlation loss. $L_{DVFSmoothness}$ is the squared sum of spatial gradients of the predicted DVF in all directions, and $L_{SegSimilarity}$ is the Dice loss between the fixed image's organ mask and the moving image's organ mask warped by the predicted DVF (refer to \citet{voxelmorph} for details). In the original formulation of MO learning in \citet{deist2023multi}, $p$ neural networks are required corresponding to $p$ solutions in the approximation set. Due to the memory intensive nature of training a 3D DIR network, this poses a challenge due to limited GPU memory. To tackle this, we modified the original implementation by sharing the weights of the encoder between $p$ DIR networks as shown in Figure \ref{fig:approach}. The DIR network predicts $p$ DIR outputs (DVFs). This is followed by calculation of $p$ loss vectors, which are used in the MO learning framework. The parameters of the DIR network are updated using a dynamic loss formulation, that, for each DIR output is defined as: 
\begin{equation}
\label{eq:dynamic_loss}
    L^i = w^i_1 L_{ImageSimilarity} + w^i_2 L_{DVFSmoothness} + w^i_3 L_{SegSimilarity} \quad \forall i\in\{1,\dots,p\}
\vspace{-2mm}
\end{equation}
Where, the weights $w^i_1, w^i_2, w^i_3$ are calculated in each iteration using HV maximization described in \citet{deist2023multi}. This ensures that at the end of the training the DIR outputs (that are used to calculate the $p$ loss vectors) are close to, and diversely distributed along the Pareto front of the three objectives.

MO DIR as described above can be understood as training $p$ DIR networks simultaneously, each with different weights for the loss terms, and the weights being selected automatically such that the HV is maximal. That said, MO DIR is fundamentally different from the traditional single DIR following hyperparameter search for the loss weights. In the traditional set up, the selection of a weight (which translate to a trade-off on the approximation front) for each loss is done a priori based on quantitative comparison of a single aggregated (on a validation set) performance metric. Whereas in MO DIR, the selection is done a posteriori by clinical experts based on qualitative evaluation of multiple criteria specific to each patient.

\vspace{-3mm}
\subsection{Data}
\input{2_data}