\section{Methodology}

\subsection{Compressed Sensing Reconstruction}
The gold standard reconstruction used in this study employs an iterative compressed sensing (CS) approach \cite{adluruisotropic}, which solves the following optimization problem:
\begin{equation}
\arg\min_x \left\{ \lambda_1 \|Ax - d\|_2^2 + \lambda_2 \text{TV}(x) + \lambda_3 \text{BM3D}(x) \right\}
\end{equation}
where $x$ is the reconstructed image, $A$ represents the forward model incorporating coil sensitivities and Fourier transform, $d$ is the acquired K-space data, $\text{TV}(x)$ enforces total variation regularization, and $\text{BM3D}(x)$ applies Block-Matching and 3D filtering.
\red{The weights $\lambda_1$, $\lambda_2$, and $\lambda_3$ are carefully tuned using hyperparameter grid search. $\lambda_1$ controls the data consistency with acquired $k$-space measurements, $\lambda_2$ controls the spatial smoothness while preserving edges, and $\lambda_3$ controls the contribution of non-local self-similarity regularization.}

% The parameters $\lambda_1$, $\lambda_2$, and $\lambda_3$ control the relative weights of data consistency and regularization terms. TV and BM3D exploits both local and non-local image structure respectively to recover high-quality reconstructions from undersampled k-space data.

\subsection{Deep Image Prior: Core Concept}
The DIP framework introduces a fundamentally different approach to image reconstruction by using an untrained neural network as a parameterized prior. The core idea can be expressed through the following optimization \cite{ulyanov2018deep}:
\begin{equation}
\arg\min_{\theta} \|f_\theta(z) - \tilde{x}\|_2^2
\end{equation}
where $f_\theta$ represents a convolutional neural network with parameters $\theta$, $z$ is a fixed random input, and $\tilde{x}$ is the degraded image. 
% Unlike traditional deep learning approaches, DIP does not require training data, instead leveraging the network architecture itself as an image prior. 
% This approach is particularly effective because the network's structure inherently favors natural image statistics over noise and artifacts.

\subsection{DIP for MRI Reconstruction}
Adapting the DIP framework to MRI reconstruction requires incorporating the physics of MRI acquisition. The vanilla DIP formulation for MRI can be expressed as:

\begin{equation}
\arg\min_{\theta} \sum_{c=1}^{N_c} \|A_c f_\theta(z) - y_c\|_2^2
\end{equation}

where \red{$N_c$ represents the number of receiver coils used in the MRI acquisition,} $A_c$  represents the forward model including unsdersampling mask, coil sensitivities and the Fourier transform. The network $f_\theta$ learns to map a fixed random noise $z$ to the reconstructed image while maintaining consistency with the acquired K-space data $y_c$ across all coils. This baseline approach, which we refer to as vanilla DIP, serves as the foundation for the subsequent methodological developments.

\subsection{Reference-Guided DIP}
Reference-guided DIP enhances the reconstruction by initializing the input $z$ with zero-filled reconstruction, instead of random noise. \cyan{Zero-filled reconstruction is the initial image obtained by directly applying inverse Fourier transform to the undersampled k-space data, filling unsampled locations with zeros.} The optimization problem maintains the same form as the vanilla DIP. This network conditioning helps guide the optimization process towards more plausible solutions in the early stages of reconstruction \cite{zhao2020reference}.
% The key distinction lies in the initialization strategy. The network is conditioned on a prior, allowing the architecture to directly leverage structural information from the initial reconstruction while maintaining data consistency. 
% This conditioning helps guide the optimization process towards more plausible solutions in the early stages of reconstruction.
% \begin{equation}
% \arg\min_{\theta} \sum_{c=1}^{N_c} |A_c f_\theta(z) - y_c|_2^2
% \end{equation}
% The key distinction lies in the initialization strategy and network conditioning. The network is conditioned on a prior, allowing the architecture to directly leverage structural information from the initial reconstruction while maintaining data consistency. This conditioning helps guide the optimization process towards more plausible solutions in the early stages of reconstruction.

\subsection{DIP-TV}
DIP-TV combines the benefits of both DIP and Total Variation regularization. The optimization problem is formulated as \cite{liu2019image}:
\begin{equation}
\arg\min_{\theta} \sum_{c=1}^{N_c} \|A_c f_\theta(z) - y_c\|_2^2 + \lambda_{\text{TV}} \|\nabla f_\theta(z)\|_1
\end{equation}
where $\lambda_{\text{TV}}$ is the TV regularization weight and \red{$\nabla f_{\theta}(z)$ denotes the spatial gradient of the CNN output, which is computed via finite differences in the image domain.} The input z can either be a random noise or zero-filled reconstruction. The total variation term preserves image edges while promoting piecewise smoothness, complementing DIP's ability to capture natural image statistics.



\subsection{Self-Guided DIP}
Self-guided DIP introduces a self-regularization mechanism through network architecture design and optimization strategy. The optimization problem becomes \cite{bell2023robust}:
\begin{equation}
\arg\min_{\theta, z} \sum_{c=1}^{N_c} \|A_c \mathbb{E}_{\eta}[f_\theta(z + \eta)] - y_c\|_2^2 + \alpha\|\mathbb{E}_{\eta}[f_\theta(z + \eta)] - z\|_2^2
\end{equation}
where $\eta$ represents random perturbations, $\mathbb{E}_{\eta}$ denotes expectation over these perturbations, \red{and $\alpha$ is the weighting parameter that controls the strength of the self-regularization term.} \cyan{The second term acts as a self-guidance mechanism, encouraging the network output to be consistent under small random perturbations of its input. This approach adds stability by acting as an implicit denoising mechanism, effectively preventing overfitting to noise.} The final reconstruction is obtained through:
\begin{equation}
x^* = \mathbb{E}_{\eta \sim P_{\eta}}[f_{\theta^*}(z^* + \eta)] 
\end{equation}
where $\eta$ is sampled from distribution $P_{\eta}$ (Gaussian in our experiments). 
% This self-guided formulation has shown particular effectiveness in handling the noise-amplification challenges common in accelerated MRI reconstruction.
