\section{Additional Information}
\label{app:appendix2}

\subsection{Methods}
\label{app:appendix2-methods}
% \subsubsection{SENSE Reconstruction}
% In the main paper when referring to the use of SENSE reconstruction we  mean combining multi-coil images into a single image via a sum using the conjugate of the sensitivity maps. More specifically, given sensitivity maps $\mat{S} = (\vec{S}^{1}, \cdots, \vec{S}^{n_c})$ and multi-coil image $(\vec{z}^{1}, \cdots, \vec{z}^{n_c})$ the SENSE image is given by:
% % 
% \begin{equation*}
%     \mathcal{E}_{\mat{S}}(\vec{z}) = \sum_{k=1}^{n_c} {\mat{S}^{k}}^{*} \vec{z}^{k}.
% \end{equation*}
% % 
% \noindent
% Therefore, given multi-coil $k$-space data  $(\vec{y}^{1}, \cdots, \vec{y}^{n_c})$ we obtain the SENSE reconstruction by first transforming this data to the image domain via the inverse Fourier transform, and then applying the conjugate sensitivity map sum:
% % 
% \begin{equation*}
%     \mathcal{E}_{\mat{S}} \circ \mathcal{F}^{-1} (\vec{y}) = \sum_{k=1}^{n_c} {\mat{S}^{k}}^{*} \mathcal{F}^{-1} (\vec{y}^{k}).
% \end{equation*}
% % 
% \noindent
% Note that for dynamic data, the above operation is performed across all frames.

\subsubsection{Straight-through Estimator}
\label{app:appendix2-ste}
In the ADS component of our proposed method, we employ a straight-through estimator (STE) to binarize the predicted probabilities, which have been rescaled to the acceleration factor and zeroed out at already sampled locations. This is key for generating a binary mask from continuous probability values. The STE employs random uniform sampling in the forward pass, where each predicted probability is compared against a randomly drawn value from a uniform distribution (see Algorithm \ref{alg:ste_forward}). This step is crucial for backpropagation because it allows the estimator to handle the non-differentiable nature of binarization. If a deterministic method like the top-k operator was used, the hard thresholding would result in non-differentiable operations, blocking gradient flow. By using random sampling, we create a smoother decision boundary that the STE can approximate during the backward pass with a sigmoid function.

Note that this stochasticity not only introduces randomness during training but also allows the model to better simulate real-world scenarios where decisions are not always deterministic.  Additionally, the stochastic nature of binarization is used during inference to maintain variability in the binary decision-making process.

In the backward pass (see Algorithm \ref{alg:ste_backward}), the STE approximates the non-differentiable binarization step with a sigmoid function that has a slope of 10, enabling gradients to propagate through the discrete sampling operation. This allows for end-to-end training via backpropagation. Without this smooth approximation, gradients would vanish due to the hard thresholding, preventing effective learning. 

\subsubsection{Handling Complex-Valued Operations}
\label{app:appendix2-complex}
Complex-valued data, including images, $k$-space data, and sensitivity maps, were decomposed into their real and imaginary components, then stacked along the channel dimension (with size 2) for processing by real-valued model weights. As a result, all model weights were real-valued. Operations like the Fourier transform and its inverse were applied by temporarily converting the data back to its complex form when required.


\subsection{Experimental Setup}
\label{app:appendix2-experiments}
\subsubsection{Dataset}
\label{app:appendix2-dataset}
As outlined in \Section{sec4.1}, we used the cine CMRxRecon challenge 2023 dataset \cite{cmrxrecon2023,Wang2021}. Specifically, the data were acquired using a 3T MRI scanner with a ‘TrueFISP’ readout. The dataset includes short-axis (SA), two, three and four-chamber long-axis (LA) views. Each scan consists of fully-sampled (ECG-triggered acquisition) multi-coil acquisitions ($n_c=10$) with 3-12 dynamic (2D + time) slices. The cardiac cycle was segmented into $n_f = 12$ temporal phases (referred to as frames in the paper), with a temporal resolution of 50 ms. The spatial resolution was 2.0×2.0 mm², with a slice thickness of 8.0 mm and a slice gap of 4.0 mm.




\subsubsection{Data Preprocessing}
\label{app:appendix2-preprocessing}
As detailed in \Section{sec2.4}, each MLP component $\mathcal{M}_{\boldsymbol{\psi_m}}$ within each cascade receives a flattened image as input, which requires a fixed input shape due to the MLP's fixed number of features. To achieve this, all data were center zero-padded to match the largest spatial size in the dataset, i.e., $(n_1, n_2) = (512, 246)$. This process involved transforming the multi-coil $k$-space data into the image domain using the inverse Fourier transform, applying center zero-padding, and then transforming the data back into the frequency domain via the Fourier transform.
\noindent
Data were normalized using the 99.5$^{th}$ percentile value of the magnitude of the fully sampled autocalibration signal for each case:
\begin{equation*}
    s = \text{quantile}_{99.5}(|\vec{y}_{\Lambda_{\text{acs}}}|).
\end{equation*}

\subsubsection{Sampling Schemes}
\label{app:appendix2-sampling}
In our experimental setup we consider predetermined or random schemes for comparison to our proposed methodology. Following the algorithms available in the literature \cite{YIASEMIS202433} we specifically consider:

\begin{itemize}
    \item Equispaced (1D/line): Lines selected at fixed intervals based on the desired acceleration, with a randomly selected offset.
    \item Random (1D/line): Lines selected from a uniform distribution up to the desired acceleration.
    \item Gaussian 1D (1D/line): Lines selected from a 1D Gaussian distribution with mean $\mu = n_2/2$ and standard deviation $\sigma = 4\sqrt{\mu}$.
    \item Gaussian 2D (2D/point): Samples drawn from a 2D Gaussian distribution with mean $\boldsymbol{\mu} = (n_1/2, n_2/2)$ and standard deviation $\boldsymbol{\Sigma} = 4 \mat{I} \sqrt{\boldsymbol{\mu}}^{T}$.
    \item Radial (2D/point): Samples selected in a radial fashion on the Cartesian grid using the CIRCUS method.
\end{itemize}
\noindent
For the above, in frame-specific experiments a distinct (arbitrary random seed) pattern from a scheme was generated per frame, whereas for unified an identical scheme was applied to all frames.
\noindent
For frame-specific experiments, within each setup, a distinct random pattern was generated per frame, while unified sampling experiments used the same pattern for all frames. 
\noindent
We also evaluated $k$t schemes that generate dynamic sampling with temporal interleaving, avoiding repeated sampling in adjacent frames: 
\begin{itemize} \item $k$t-Equispaced \item $k$t-Gaussian 1D \item $k$t-Radial \end{itemize}
\noindent
During training, arbitrary patterns were generated without fixed seeds to maximize model exposure to varied data. At inference, the seed for generating patterns was fixed for each scan/patient to ensure consistency (e.g. during validation).


\subsubsection{Loss Function Definitions}
\label{app:appendix2-loss}
This study utilizes multiple loss components calculated either in the image domain or the frequency domain and are derived from established literature. The definitions of these components are as follows:

\begin{itemize}
    \item Structural Similarity Index Measure (SSIM) Loss
    
        \begin{equation*}
                \mathcal{L}_\text{SSIM} := 1 - \text{SSIM}, \quad \text{SSIM}(\vec{z},\,\vec{w}) =
            \frac{1}{N}\sum_{i=1}^{N} \frac{(2\mu_{\vec{z}_i}\mu_{\vec{w}_i} + \gamma_1)(2\sigma_{\vec{z}_i\vec{w}_i} + \gamma_2)}{({\mu^2_{\vec{z}_i}} +{\mu^2_{\vec{w}_i}} + \gamma_1)({\sigma^2_{\vec{z}_i}} + {\sigma^2_{\vec{w}_i}} + \gamma_2)},
        \end{equation*}
    
        where $\vec{z}_i, \vec{w}_i, i=1,...,N$ represent $7\times 7$ square windows of  $\vec{z}, \vec{w}$, respectively, and  $\gamma_1 = 0.01$, $\gamma_1 = 0.03$. Additionally, $\mu_{\vec{z}_i}$, $\mu_{\vec{w}_i}$ denote the means of each window, $\sigma_{\vec{z}_i}$ and $\sigma_{\vec{w}_i}$ represent the corresponding standard deviations. Lastly, $\sigma_{\vec{z}_i\vec{w}_i}$ represents the covariance between $\vec{z}_i$ and $\vec{w}_i$.

    \item Structural Similarity Index Measure 3D (SSIM3D) Loss

        \begin{equation*}
                \mathcal{L}_\text{SSIM3D} := 1 - \text{SSIM3D},
            \label{eq:ssim_metric} 
        \end{equation*}

    where SSIM3D follows the same definition as SSIM, but replacing the $7 \times 7$ windows with cubic windows $7 \times 7 \times 7$.
    
    \item Mean Average Error ($L_1$) Loss
    \begin{equation*}
        \mathcal{L}_1(\vec{z},\,\vec{w}) = || \vec{z} - \vec{w} ||_1 = \sum_{i=1}^n |z_{i} - w_{i}|
    \end{equation*}
    
    \item High Frequency Error Norm (HFEN)

    \begin{equation*}
        \mathcal{L}_{\text{HFEN}} := {\text{HFEN}}, \quad  {\text{HFEN}}(\vec{z},\,\vec{w})  = \, \frac{|| \mathcal{G}(\vec{z}) - \mathcal{G}(\vec{w}) ||_1}{||\mathcal{G}(\vec{w})||_1},
        \label{eq:hfen}
    \end{equation*}

     where $\mathcal{G}$ is a Laplacian-of-Gaussian filter  with kernel of size $15\times 15$ and with a standard deviation of 2.5.

    \item Normalized Mean Average Error (NMAE)
    \begin{equation*}
        \mathcal{L}_{\text{NMAE}} := \text{NMAE}, \quad  \text{NMAE}(\vec{z},\, \vec{w})\,= \, \frac{||\vec{z}\,-\,\vec{w}||_1}{||\vec{z}||_1}\,= \, \frac{\sum_{i=1}^n |z_{i} - {w}_{i}|}{\sum_{i=1}^n |z_{i}|}.
        \label{eq:nmae}
    \end{equation*}

\end{itemize}


\subsubsection{Selection of Optimal Model Checkpoints}
Results were obtained using the best-performing model checkpoints, selected based on validation set performance using the SSIM metric.

\subsubsection{Significance Testing}
\label{app:appendix2-aso}
In our study, we used the almost stochastic order (ASO) test \cite{dror2019deep} with a significance level of $\alpha=0.05$ to compare reconstruction metrics between models due to its robustness in handling complex data distributions. Traditional parametric tests, such as the t-test, assume that the differences between models follow a normal distribution and have equal variances. However, these assumptions are not always valid in deep learning contexts where data distributions can be highly irregular and performance metrics can be influenced by various stochastic factors. ASO does not rely on such assumptions. Instead, it evaluates the degree to which one distribution stochastically dominates another, providing a more reliable assessment of significance when comparing models. This approach is particularly suitable for deep learning applications where performance metrics can be non-normally distributed and vary across experiments.
% By using ASO, we ensure a more accurate evaluation of model differences, accounting for the complexities and variabilities inherent in neural network performance.


\subsection{Reconstruction Model Robustness Experiments}
\label{app:appendix2-robustness-experiments}
To assess the robustness of our end-to-end pipeline, we repeated the comparative studies using a state-of-the-art reconstruction model, specifically the Model-based neural network with enhanced deep learned regularizers (MEDL-Net) \cite{qiao2023medl}, instead of vSHARP. Below, we provide details on the optimization process and architectural design.

\subsubsection{Optimization}
\label{app:appendix2-optimization}
The models were developed in PyTorch \cite{paszke2019pytorch}, following the same training scheme as our main experiments. We used the Adam optimizer with an initial learning rate of \(1 \times 10^{-3}\), which increased linearly to \(3 \times 10^{-3}\) over 2,000 iterations and subsequently decreased by 20\% every 10,000 iterations, over a total of 52,000 iterations. Experiments were conducted on single NVIDIA A6000 or A100 GPUs with a batch size of 1.

For training, we adopted the loss function proposed by the authors in the original publication \cite{qiao2023medl}, computed exclusively in the image domain:
% 
\begin{equation*}
    \begin{split}
    \mathcal{L} = \sum_{j=1}^{T} w_j \sum_{t=1}^{n_f}  & \mathcal{L}_\text{MSE}(\hat{\vec{x}}_t^{(j)}, \vec{x}_t^*)   , \quad
    w_j = \Big\{\begin{array}{lr}
        0.1, & j= 1, \cdots, T-1\\
        1, & j=T
        \end{array}.
    \end{split}
\end{equation*}
\noindent
where $\{\hat{\vec{x}}^{(j)}\}_{j=1}^{T}$ denotes the predicted dynamic images from MEDL's unrolled steps, and $\mathcal{L}_\text{MSE}$ the mean squared error loss function.


\subsubsection{Hyperparameter Settings} 
\label{app:appendix2-hyperparameters}
For sensitivity map estimation, we used an SMP configuration identical to that in our adaptive sampling experiments.  Similarly, in our adaptive sampling experiments, the ADS model module was configured identically to that in the vSHARP-based reconstruction experiments (see Section II.F.3 of the main paper). For MEDL, we employed the default hyperparameters as specified in the corresponding publication \cite{qiao2023medl}.