\section{Extended Discussion}
\label{sec:ap7}
\subsection{Limitations}

While our experiments indicate that JSSL demonstrates improvements over conventional SSL methods, several limitations warrant discussion. Firstly, the efficacy of JSSL is highly dependent on the availability and quality of proxy datasets. Although datasets such as the fastMRI datasets contain fully-sampled data and are readily available, there might be instances where such datasets cannot be used. This could occur in cases where the anatomical regions of interest in the proxy datasets are not sufficiently similar to those in the target dataset, or where differences in imaging protocols and acquisition parameters introduce significant discrepancies.

For instance, in experiment set \textbf{A}, where the fastMRI prostate data served as the target domain and brain and knee fastMRI datasets were used as proxies, the SL PROXY setup showed relatively good performance, indicating that training with similar proxy domains can still be beneficial for out-of-distribution inference. However, in experiment set \textbf{B}, where the CMRxRecon cardiac data was the target and brain, knee, and prostate fastMRI datasets served as proxies, the performance of SL PROXY was significantly lower than all methods, highlighting that when proxies are dissimilar to the target, SL PROXY struggles to generalize effectively. In both scenarios, JSSL consistently surpassed SL PROXY, indicating that the combined supervised and self-supervised approach is more robust, regardless of the proxy dataset's similarity.

Additionally, the inclusion of proxy datasets in training can introduce biases, particularly if there are substantial differences between the proxy and target domains. This bias could potentially degrade the model's performance on the target dataset, as observed in some of our supervised learning experiments.

Moreover, similar to any DL-based method, JSSL's performance is influenced by the choice of loss functions for each component of the JSSL loss and their weighting in the loss $\mathcal{L}_{\boldsymbol{\psi}}^{\text{JSSL}}$. In our experiments, we employed identical dual-domain loss functions for each component and equal weighting for the SL and SSL components (see Appendix \Section{ap4}). However, different loss and weighting choices might affect JSSL's performance.

JSSL performance also depends on the partitioning strategy used for subsampled data in self-supervised learning. While we adopted a Gaussian partitioning scheme, alternative strategies might yield different results and require further exploration. The optimal partitioning scheme may vary depending on the specific characteristics of the target and proxy datasets, as well as the desired reconstruction quality.

Lastly, our experiments are limited to comparing only one SSL method (SSDU) and does not consider other proposed self-supervised methodologies. However, the reason for comparing to SSDU only is that we consider it representative, as most SSL-based methods are derivatives of SSDU and still employ SSL-based losses to train their models (refer to Apendix \ref{sec:ap1}). In addition, comparing to methods that train more than one model as their SSL task is outside the scope of this research, as this can introduce additional computational difficulties and are derivatives of the SSDU method. Our purpose is to compare JSSL and SSL training methods in their general forms.