
\section{Experiments}
\label{sec:sec4}

\subsection{Dataset}
\label{sec:sec4.1}

We utilized the CMRxRecon dataset \cite{cmrxrecon2023}, which comprises 473 scans from distinct patients, consisting of 4D multi-coil cine cardiac $k$-space data (a total of 3,185 2D dynamic slices). The data were split into training, validation, and test sets (251, 111, 111 scans, respectively). For more details refer to \Appendix{appendix2-dataset}. In addition, we further evaluated generalization on an unseen ventricular outflow tract / aortic cine dataset from CMRxRecon 2025 \cite{b6xs-gv29-25}, which differs in anatomical view and motion characteristics and was not observed during training (44 scans).


\subsection{Comparative \& Ablation Studies}
\label{sec:sec4.3}
To validate our E2E-ADS-Recon, particularly with respect to adaptive sampling, we evaluate various sampling strategies under both frame-specific and unified settings:

\begin{enumerate}[leftmargin=0em,label=]
    \item \textbf{Learned}: 
    \begin{enumerate}[leftmargin=1.5em]
    % 
    \item Our pipeline employing different initializations: (i) ACS-initiated ($\Lambda_0 = \Lambda_{\text{acs}}$) (Adpt-AcsInit), (ii) equispaced with $\frac{|\Omega|}{|\Lambda_{0}|} = R - 4$ (Adpt-EqInit-I), (iii) $\frac{|\Omega|}{|\Lambda_{0}|} = R - 2$ (Adpt-EqInit-II), and (iv) $k$t-equispaced with $\frac{|\Omega|}{|\Lambda_{0}|} = R - 2$ (Adpt-$k$tEqInit-II).  For configurations (ii)-(iv), $\Lambda_{\text{acs}} \subset \Lambda_0$, and these are applicable only to 1D sampling.
    % 
    \item Non-adaptive optimized learned schemes, where the sampling space is parameterized and optimized end-to-end with the reconstruction model \cite{Bahadir2019} (Opt).
% 
    \end{enumerate}
    \item \textbf{Predetermined/Random}: 
    \begin{enumerate}[leftmargin=1.5em]
        % 
        \item Common non-adaptive schemes, including (i) 1D random uniform (Rand), (ii) 1D Gaussian (Gauss-1D), (iii) 2D equispaced (Equi), (iv) 2D Gaussian (Gauss-2D), and (v) radial (Rad) trajectories \cite{YIASEMIS202433}. In frame-specific experiments, a unique trajectory was used for each frame, whereas in unified settings, the same pattern was applied across all frames. 
        % 
        \item Non-adaptive schemes with temporal interleaving ($k$t schemes) \cite{tsao2003k}, including $k$t-equispaced ($k$tEqui), $k$t-Gaussian 1D ($k$tGauss-1D) and $k$t-radial ($k$tRad). 
    \end{enumerate}
\end{enumerate}
\noindent
We provide more information in \Appendix{appendix2-sampling}
% 

For non-adaptive methods, we replace the ADS with each sampling strategy while keeping the rest of the architecture (SMP and reconstruction model) unchanged.
% 

\noindent
We also conduct ablation studies on the E2E-ADS-Recon model with the following choices:
\begin{enumerate}[leftmargin=*]
    % 
    \item A modified version of our proposed method using a single cascade for ADS ($N=1$) instead of two (comparative studies), also employing different initializations.
    % 
    \item Non-uniform sampling budget allocation across time frames in the ADS module (Adpt-NU), compared to equal division used in the original framework (applicable only in frame-specific settings).
\end{enumerate}
% 

\subsection{Experimental Setup}
\label{sec:sec4.2}

\noindent \textbf{Optimization} Models were developed in PyTorch \cite{paszke2019pytorch}, using Adam \cite{kingma2014adam} with a learning rate starting at 1e-3, linearly increasing to 3e-3 over 2k iterations, then reducing by 20\% every 10k iterations, over 52k iterations. Experiments were conducted on single A6000 or A100 NVIDIA GPUs, with a batch size of 1. We used a dual-domain loss strategy \cite{vsharp2023}, combining image and frequency domain losses:
% 
\begingroup
\setlength{\abovedisplayskip}{2pt}    % space above
\setlength{\belowdisplayskip}{2pt}    % space below
{
\begin{equation}
    \begin{gathered}
    \mathcal{L} = \sum_{j=1}^{T} w_j \Big[ \sum_{t=1}^{n_f} 
    \big( \mathcal{L}_\text{SSIM}(\hat{\vec{x}}_t^{(j)}, \vec{x}_t^*) + \mathcal{L}_1(\hat{\vec{x}}_t^{(j)}, \vec{x}_t^*) +  \mathcal{L}_\text{HFEN}(\hat{\vec{x}}_t^{(j)}, \vec{x}_t^*) \big) \\ +  \mathcal{L}_\text{SSIM3D}(\hat{\vec{x}}^{(j)}, \vec{x}^*) \Big] +
     3\cdot \mathcal{L}_\text{NMAE}(\hat{\vec{y}}, \vec{y}), \quad w_j = 10^{(j-T)/(T-1)}
    \end{gathered}
\tag{\theequation}
\stepcounter{equation}
\end{equation}
}
\endgroup
% 
\noindent
where $\{\hat{\vec{x}}^{(j)}\}_{j=1}^{T}$ denotes the predicted dynamic images from vSHARP's unrolled steps, and $\hat{\vec{y}}$ represents the predicted $k$-space data. The choice of individual loss components and their respective weights follows the training protocol of the vSHARP applied to the CMRxRecon challenge 2023 \cite{vsharp2023, Yiasemis2024,lyu2024state}. The definitions of each loss component can be found in \Appendix{appendix2-loss}.

\vspace{3pt} \noindent \textbf{Hyperparameter Settings}  
In our adaptive sampling experiments, we configured the ADS sampler with $N=2$ cascades. Unless specified otherwise, we use image domain encoding ADS modules. Our setup features encoders with $l_{\text{enc}}=3$ scales and MLPs with $l_{\text{mlp}}=3$ layers each. 
% For optimized sampling, we employed an identical pipeline as our end-to-end approach by replacing the sampler model with a $n_a$-dimensional parameter representing $\Omega$ ($n_a$ defined as in \Sec{sec2.4}). 
In all our experiments the SMP module comprised a 2D U-Net with 4 scales (16, 32, 64, 128 channels) and we used vSHARPs \cite{Yiasemis2024} with $T=8$ and 3D U-Nets as denoisers composed of 4 scales (16, 32, 64, 128 channels).

\vspace{3pt} \noindent \textbf{Reconstruction Model Robustness} 
\label{sec:subsec4.2.3} We repeat the (frame-specific) comparative studies outlined in \Section{sec4.3} using MEDL-Net \cite{qiao2023medl} as the reconstruction model instead of vSHARP to explore the robustness of our end-to-end pipeline. Optimization and hyperparameter choice details are specified in \Appendix{appendix2-robustness-experiments}.


\vspace{3pt} \noindent \textbf{Subsampling}
All experiments (learned or otherwise) used a fraction $r_{\text{acs}} := |\Lambda_{\text{acs}}| = 4\%$ of $\Omega$ to fully sample the $k$-space center, denoted as $\tilde{\vec{y}}_{\Lambda_{\text{acs}}}$, which is used for sensitivity map prediction, similar to the literature \cite{sriram2020end, peng2022deepsense, Yiasemis_2022_CVPR}.  In learned sampling experiments, $\Lambda_0 = \Lambda_{\text{acs}}$, unless stated otherwise. During training the acceleration was randomly chosen between $4\times, 6\times, 8\times$, while for inference we evaluated each setup on acceleration factors of $4\times, 6\times$ and $8\times$.
In addition, we evaluated all trained models at higher acceleration factors (10$\times$ and 12$\times$) not seen during training to assess extrapolation behavior.

\vspace{3pt} \noindent \textbf{Evaluation}
\label{sec:subsec4.2.2}
The models were assessed using SSIM, PSNR, and NMSE metrics, as defined in the literature \cite{YIASEMIS202433}. These metrics were averaged per slice or frame within each scan, after being centrally cropped (region of interest, ignore background) to two-thirds of each dimension.
For significance testing, we use the almost stochastic order test \cite{dror2019deep, ulmer2022deep} with $\alpha = 0.05$ (see \Appendix{appendix2-aso}).
