\section{Results}

\begin{figure}[h]
\centering
\includegraphics[width=0.8\textwidth]{fig/model.png}
% \caption{Architecture used in our work: ResUNet Model with skip connections and residual connections. The Gaussian noise induction
% part corresponds to the denoiser in the loss function of the self-guided DIP.}
\caption{ResUNet architecture with skip connections, residual blocks, and Gaussian noise injection for self-guided DIP denoising.}
\label{fig:model}
\end{figure}

\subsection{Implementation Details}
The compressed sensing reconstruction was implemented using parallel computing optimization in MATLAB. Through extensive grid-search of the parameters, we set the regularization parameters $\lambda_1 = 0.033$, $\lambda_2 = 8.75\times10^{-6}$, and $\lambda_3 = 0.1$, with BM3D denoising parameter $\sigma = 1.25$. 
After getting the rawdata from the scanner, we retrospectively removed motion-contaminated data by analyzing the navigator signal and retaining only samples acquired during stable respiratory phases.
The reconstruction process started with an initial estimate as the zero-filled reconstruction, employing an early stopping criterion with a maximum of 1000 iterations.

For the deep image prior implementations, we utilized a ResUNet architecture with skip connections (Figure~\ref{fig:model}), which has demonstrated strong performance in medical image processing tasks \cite{kumar2023brain}. The network processes complex-valued input data by treating real and imaginary components as separate channels. Empirically, we found that Leaky ReLU activation functions demonstrated marginally better performance (PSNR/ SSIM) compared to ReLU or tanh activations. This improvement is likely due to Leaky ReLU's ability to preserve negative values in the complex-valued MRI data, where both real and imaginary components contain important signal information. We used Adam optimizer with learning rate $3\times10^{-4}$. 
% For self-guided DIP, we set the denoiser weight $\alpha$ as 4. For TV-DIP, we set the regularizarion weight $\lambda_{\text{TV}}$ as 2. We ran all the DIP based methods upto 2500 epochs.
For self-guided DIP, we set the denoiser weight $\alpha$ as 4. We found that with $\alpha=4$, we got the best PSNR. For TV-DIP, we set the regularizarion weight $\lambda_{\text{TV}}$ as 2, as it obtained the best PSNR. For the Gaussian noise $\eta$ in self-guided DIP, we used 0 as mean and $m/2$ as the standard deviation, where $m$ is the maximum value of magnitude of the initial image. We ran all the DIP based methods upto 2500 epochs.

As a baseline comparison, we also implemented a supervised U-Net approach trained on fully-sampled reconstructions \cite{van2021improvement}. The network was trained using a combination of perceptual loss (based on VGG16 features) and mean absolute error, with the compressed sensing reconstructions from full acquisition time serving as ground truth. We employed a ‘leave-one-out’ cross-validation strategy, training our model on 9 subjects and testing on the remaining one, iterated across all 10 subjects. Training occured over 1000 epochs using the Adam optimizer with a $3\times10^{-4}$ learning rate. 
% All experiments were run in an A100 GPU.

\red{The computational requirements vary significantly across methods. The iterative CS reconstruction required approximately 3.5 hours per volume on a 16-core CPU server with 128GB RAM. The Deep Learning based methods were implemented in PyTorch and ran on an NVIDIA A100 GPU with 80GB memory. Processing times for a full 3D volume (approximately 140 slices) were: Vanilla DIP (28 minutes), Reference-Guided DIP (29 minutes), DIP-TV (37 minutes), and Self-Guided DIP (39 minutes). Memory requirements peaked at 21GB for Vanilla DIP and Reference-Guided DIP, 25GB for DIP-TV, and 35GB for Self-Guided DIP. For comparison, the supervised U-Net approach required 48GB during training and 7GB for inference.}

\subsection{Experimental Setup}
We conducted experiments on cardiac MRI data from 10 subjects, acquired over approximately 12-15 minutes per scan. For each subject, we generated multiple undersampled datasets by extracting k-space data corresponding to different acquisition time fractions (1/6th, 1/4th, 1/2nd, and full time). Motion-consistent data was retained through retrospective navigator gating. Each reconstruction method received identical inputs: the undersampled k-space data, sampling mask, and coil sensitivity maps.

\begin{figure}[h]
\centering
\includegraphics[width=1\textwidth]{fig/vis.pdf}
% \caption{Qualitative comparison of reconstruction methods at different undersampling rates. The columns show different reconstruction approaches and the rows represent increasingly aggressive undersampling at 1/2, 1/4, and 1/6 of the full acquisition time. Error maps inside the red box at the lower left of each reconstruction show the absolute difference from the ground truth.}
\caption{Comparison of LGE-MRI reconstruction methods at different acquisition times (1/2, 1/4, and 1/6 of full scan time), with error maps (red boxes) showing deviations from ground truth.}
\label{fig:qual_comparison}
\end{figure}

% \begin{figure}[h]
% \centering
% \includegraphics[width=0.9\textwidth]{fig/tab.png}
% \caption{Comparison of reconstruction quality metrics. Values shown are mean ± standard deviation across all subjects.}
% \label{fig:metrics_table}
% \end{figure}


% \begin{table}[htbp]
% \centering
% \caption{Comparison of reconstruction quality metrics (PSNR and SSIM) for different methods at various acquisition time fractions. Values shown are mean $\pm$ standard deviation across all subjects.}
% \label{tab:recon_metrics}
% \renewcommand{\arraystretch}{1.3}
% \begin{tabular}{|l|c|c|c|c|c|c|}
% \hline
% \multirow{2}{*}{\textbf{Method}} & \multicolumn{2}{c|}{\textbf{1/2 acquisition time}} & \multicolumn{2}{c|}{\textbf{1/4 acquisition time}} & \multicolumn{2}{c|}{\textbf{1/6 acquisition time}} \\
% \cline{2-7}
%  & \textbf{PSNR (dB)} & \textbf{SSIM} & \textbf{PSNR (dB)} & \textbf{SSIM} & \textbf{PSNR (dB)} & \textbf{SSIM} \\
% \hline
% Self-Guided DIP & $34.5\pm1.0$ & $0.912\pm0.012$ & $32.8\pm1.2$ & $0.891\pm0.015$ & $29.4\pm1.4$ & $0.862\pm0.018$ \\
% \hline
% DIP-TV & $31.8\pm1.3$ & $0.885\pm0.015$ & $29.7\pm1.5$ & $0.863\pm0.018$ & $26.8\pm1.8$ & $0.828\pm0.023$ \\
% \hline
% Reference-Guided DIP & $31.4\pm1.4$ & $0.882\pm0.016$ & $29.2\pm1.6$ & $0.858\pm0.020$ & $26.5\pm1.9$ & $0.825\pm0.024$ \\
% \hline
% Vanilla DIP & $29.5\pm1.7$ & $0.858\pm0.022$ & $27.6\pm1.9$ & $0.835\pm0.025$ & $24.8\pm2.2$ & $0.798\pm0.028$ \\
% \hline
% CS reconstruction & $30.6\pm1.5$ & $0.873\pm0.018$ & $28.4\pm1.8$ & $0.842\pm0.023$ & $25.7\pm2.0$ & $0.812\pm0.026$ \\
% \hline
% Supervised U-Net & $32.3\pm1.2$ & $0.889\pm0.015$ & $30.1\pm1.5$ & $0.867\pm0.019$ & $27.2\pm1.7$ & $0.835\pm0.022$ \\
% \hline
% \end{tabular}
% \end{table}

\setcounter{table}{0}

\begin{table}[htbp]
\centering

\setlength{\tabcolsep}{3.5pt} 
\footnotesize  % Even smaller than \small
\renewcommand{\arraystretch}{1.5}  
\begin{tabular}{|l|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Method}} & \multicolumn{2}{c|}{\textbf{1/2 acquisition time}} & \multicolumn{2}{c|}{\textbf{1/4 acquisition time}} & \multicolumn{2}{c|}{\textbf{1/6 acquisition time}} \\
\cline{2-7}
 & \textbf{PSNR} & \textbf{SSIM} & \textbf{PSNR} & \textbf{SSIM} & \textbf{PSNR} & \textbf{SSIM} \\
\hline
\textbf{Self-Guided DIP} & $34.5\pm1.0$ & $0.912\pm0.012$ & $32.8\pm1.2$ & $0.891\pm0.015$ & $29.4\pm1.4$ & $0.862\pm0.018$ \\
\hline
\textbf{DIP-TV} & $31.8\pm1.3$ & $0.885\pm0.015$ & $29.7\pm1.5$ & $0.863\pm0.018$ & $26.8\pm1.8$ & $0.828\pm0.023$ \\
\hline
\textbf{Ref-Guided DIP} & $31.4\pm1.4$ & $0.882\pm0.016$ & $29.2\pm1.6$ & $0.858\pm0.020$ & $26.5\pm1.9$ & $0.825\pm0.024$ \\
\hline
\textbf{Vanilla DIP} & $29.5\pm1.7$ & $0.858\pm0.022$ & $27.6\pm1.9$ & $0.835\pm0.025$ & $24.8\pm2.2$ & $0.798\pm0.028$ \\
\hline
\textbf{CS recon} & $30.6\pm1.5$ & $0.873\pm0.018$ & $28.4\pm1.8$ & $0.842\pm0.023$ & $25.7\pm2.0$ & $0.812\pm0.026$ \\
\hline
\textbf{Supervised U-Net} & $32.3\pm1.2$ & $0.889\pm0.015$ & $30.1\pm1.5$ & $0.867\pm0.019$ & $27.2\pm1.7$ & $0.835\pm0.022$ \\
\hline
\end{tabular}

\green{\caption{Comparison of reconstruction quality metrics (PSNR and SSIM) for different methods at various acquisition time fractions. Values shown are mean ± standard
deviation across all subjects.}}
\label{tab:recon_metrics}

% \vspace{1mm}
% \raggedright
% \scriptsize Note: Values shown as mean $\pm$ standard deviation. PSNR in dB. Numbers in header row indicate acquisition time fractions.
\end{table}

\begin{figure}[h]
\centering
\includegraphics[width=0.7\textwidth]{fig/dip_comp.pdf}
\caption{Comparison of average PSNR values across training epochs for different DIP variants. All reconstructions use data from 1/4th of the acquisition time.}
\label{fig:dip_comp}
\end{figure}

\subsection{Analysis}

Our evaluation shows Self-Guided DIP's effectiveness for accelerated LGE-MRI reconstruction across different undersampling rates. 
\red{Figure~\ref{fig:qual_comparison} presents visual comparisons between all methods at varying acquisition times from a randomly selected human subject. We have cropped the images to the heart region for better visualization of the structural details of the region of interest.} At 1/4th of the full acquisition time, Self-Guided DIP produces reconstructions visually comparable to fulla acquisition time CS reconstruction, while CS with the 1/4 acquisition time shows significant quality degradation, particularly in the thin left-atrium wall.

For quantitative assessment, we have used two metrics: Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). 
% Table~\ref{fig:metrics_table} presents these metrics across all subjects for each reconstruction method and acquisition time. 
% \red{Table~\ref{tab:recon_metrics} presents these metrics averaged across all 10 subjects for each reconstruction method and acquisition time. The values reported correspond to the optimal epoch for each method and subject. These metrics are calculated using the full reconstructed images, not just the cropped heart regions shown in the visual comparisons.}
\red{Table 1 presents these metrics averaged across all 10 subjects for each reconstruction method and acquisition time. The values reported correspond to the optimal epoch for each method and subject. These metrics are calculated using the full reconstructed images, not just the cropped heart regions shown in the visual comparisons.}

Deep Image Prior models tend to overfit when trained for excessive epochs \cite{wang2021early}. The stability analysis (Figure~\ref{fig:dip_comp}) reveals varying overfitting behaviors across DIP variants.
We tested this through an experiment running the DIP models up to 4000 epochs, \red{where PSNR values were calculated and averaged across all subjects at each epoch. This analysis provides insights into the optimal stopping points for different DIP variants.} The Self-Guided DIP achieves the highest PSNR and maintains better stability over epochs, but starts declining around 2300 epochs. While Vanilla DIP shows early convergence but starts declining very soon. Both DIP-TV and Reference-Guided DIP demonstrate intermediate performance with gradual degradation after reaching their peaks around \cyan{1700} epochs. The PSNR values were calculated using the full reconstructed images. 

\red{An important consideration for working with a prospective subject is determining an appropriate stopping criterion without access to ground truth for PSNR calculation. Based on the experiments above on stability analysis, we find that 2000 epochs generally provide optimal results for self-guided DIP. In future work, we plan to explore metric-based automated stopping criteria, for example, monitoring the data consistency term until it stabilizes or tracking the structural consistency between consecutive reconstructions. Other promising approaches include Stein's Unbiased Risk Estimator (SURE) \cite{khan2024adaptive} and methods leveraging acquisition noise characteristics, which we aim to explore in our ongoing research efforts.}
