

\section{Experiments}
\subsection{Experimental Setup}


%%crop the COCO dataset
\paragraph{Datasets for Training and Evaluation.} We evaluated our framework on three target datasets: FastMRI~\citep{fastmri}, BrainTumor~\citep{braintumor}, and OASIS~\citep{oasis}. For training, we used COCO~\citep{COCO} as the out-of-domain dataset and IXI~\citep{IXI} T2 brain MRI scans as the in-domain dataset. We evaluated performance using PSNR, SSIM~\citep{psnr}, and LPIPS~\citep{lpips} metrics on the test sets. Dataset specifications are detailed in Table \ref{tab:dataset}.


% We evaluated our framework using three target datasets: fastMRI~\citep{fastmri}, BrainTumor~\citep{braintumor}, and OASIS~\citep{oasis}, with COCO~\citep{COCO} and IXI~\citep{ixi} serving as out-of-domain and in-domain datasets, respectively. ImageNet was excluded due to GPU resource constraints. The COCO dataset provides diverse everyday scene images, while IXI contains brain MRI scans from healthy subjects with multiple modalities (T1, T2, and PD-weighted), of which we utilized T2 scans for training. The target datasets (fastMRI, BrainTumor, and OASIS) contain the high-quality brain tumor images for our generation task. Detailed specifications of all datasets are provided in Table \ref{tab:dataset}. Performance evaluation employed three metrics: two distortion-based metrics (PSNR and SSIM~\citep{psnr}) and one perception-based metric (LPIPS~\citep{lpips}), with all assessments conducted on the test sets of the three target domain high-resolution datasets.

\begin{table}[ht]
\caption{Dataset Specifications for Multi-stage Training in 4× Super-resolution.}
\setlength{\tabcolsep}{4.5pt}
    \centering
    \begin{tabular}{lcp{1.38cm}<{\centering}ccccc}
        \toprule
        \textbf{Name} & \textbf{Res.} & \textbf{Size$^*$}   & \textbf{Type} & \textbf{Stage}\\ \midrule
        COCO & $16 \to 64$  & 98k/100/-    &  General & OOD\\
        \midrule
        IXI & $16 \to 64$ & 60k/60/-   &  T2 &ID \\
        \midrule% This adds a horizontal line
        FastMRI & $64\to 256$ & 300/10/60  &  T2  &TD\\ 
        BrainTumor & $64 \to 256$ & 300/10/60  &  T1/T2   &TD\\   
        OASIS  &  $64 \to 256$ & 300/10/60  &  
         T1  &TD \\
        \bottomrule
    \end{tabular}
    
    
    \footnotesize{\textit{Note:}
     $^*$ Sample sizes indicate train/validation/test split counts. T1/T2  denotes T1-weighted or T2-weighted magnetic resonance imaging (MRI) scans.
    }
    
    
    \label{tab:dataset}
\end{table}




\paragraph{Implementation Details.} 


We performed super-resolution (SR) at $2\times$, $4\times$, and $8\times$ scales with noise $\bm{\eta}$ set to zero. All datasets were cropped to consistent input dimensions and degraded using bicubic downsampling. Single-channel grayscale MRI datasets were expanded to three channels through duplication. Datasets were normalized to $[0,1]$ and augmented with flip and rotation transforms. The SR3 model with ControlNet conditioning was trained in three stages: (1) initial training on out-of-domain (OOD) COCO dataset for 1M steps, (2) fine-tuning on in-domain (ID) IXI dataset for 1M steps, and (3) final fine-tuning on target domain (TD) datasets for 20K steps. All experiments used Adam optimizer with learning rate $1e^{-4}$.





\begin{table}[t]
\centering
\caption{Inference times and parameter counts for each evaluated method.}
\label{tab:efficiency}
    \begin{tabular}{lcp{2.8cm}<{\centering}ccc}
    \toprule
    Method & Parameter Count & Inference Time \\
    \midrule
    DPS & 552.81M & 117.3 s/sample \\
    I2sb & 552.80M & 57.0 s/sample \\
    sinSR & 118.59M & 1.22 s/sample \\
    MSP-SR & 136.28M & 50.0 s/sample \\
    \bottomrule
    \end{tabular}
    
  
    \footnotesize{\textit{Note:} Inference times measured on NVIDIA A100 GPU.}
\end{table}



\begin{table*}[ht]
    \centering
    \caption{Performance Comparison of MSP-SR Against State-of-the-Art Methods Across Multiple Datasets}
    \label{tab:baseline_result}
 
    \setlength{\tabcolsep}{3.5pt} % 进一步减小列间距以适应更多列
    % \renewcommand{\arraystretch}{0.9}
    \begin{tabular}{l|ccc|ccc|ccc} % 每个数据集三列
        \toprule
        & \multicolumn{3}{c|}{\textbf{FastMRI}} & \multicolumn{3}{c|}{\textbf{BrainTumor}} & \multicolumn{3}{c}{\textbf{OASIS}} \\
        \cmidrule{2-10}
        \textbf{Method} & PSNR$\uparrow$ & SSIM$\uparrow$ & LPIPS$\downarrow$ & PSNR$\uparrow$ & SSIM$\uparrow$ & LPIPS$\downarrow$ & PSNR$\uparrow$ & SSIM$\uparrow$ & LPIPS$\downarrow$ \\
        \midrule
        DPS     & 23.00 & 0.723 & 0.3678 & 20.56 & 0.620 & 0.3553 & 21.94 & 0.622 & 0.3456 \\
        SinSR   & 26.98 & 0.843 & 0.2745 & 22.79 & 0.7201 & 0.2680 & 21.95 & 0.711 & 0.2445  \\
        I2SB    & 16.01 & 0.123 & 0.5758  & 14.95 & 0.1446 & 0.6082 & 15.68 & 0.1299 & 0.6011 \\
        MSP-SR  & \textbf{28.71} & \textbf{0.846} & \textbf{0.1450} & \textbf{27.34} & \textbf{0.811} & \textbf{0.1626} & \textbf{29.03}  & \textbf{0.872} & \textbf{0.1319} \\
        \bottomrule
    \end{tabular}
      
    
    \footnotesize{\textit{Note:} Best results are highlighted in bold. LPIPS uses VGG backbone. }
\end{table*}


\subsection{Performance}


\paragraph{Performance Evaluation Against Existing Methods.} 
To demonstrate the effectiveness of MSP-SR framework, we evaluated against state-of-the-art super-resolution methods across three medical imaging datasets: FastMRI, BrainTumor, and OASIS, as shown in Table~\ref{tab:baseline_result}. The comparison includes both training-free approaches (DPS~\cite{dps}) and learning-based methods (I2SB~\cite{i2sb} and SinSR~\cite{sinsr}). For the learning-based baselines, we utilized their provided pre-trained weights followed by direct fine-tuning (DFT) on target datasets using approximately 300 training samples per dataset. Our approach consistently outperformed existing methods across all datasets and evaluation metrics. For the FastMRI dataset, which contains homogeneous T2-weighted axial slices, MSP-SR achieved a PSNR of 28.71dB and SSIM of 0.846, surpassing all baseline methods. The framework demonstrated even more substantial improvements on the more heterogeneous BrainTumor (containing both T1- and T2-weighted images) and OASIS (comprising both axial and coronal brain MRI scans) datasets, achieving PSNR values of 27.34dB and 29.03dB, respectively.


In addition to quality metrics, we also report parameter counts and inference times for those models in Table~\ref{tab:efficiency}. As shown in the table, SinSR's significantly faster inference time is due to its use of latent space models, while the other methods (including MSP-SR) operate in the ambient dimension, which naturally requires more computational resources. This fundamental architectural difference explains the observed variance in processing speeds across the evaluated approaches. Our MSP-SR achieves a practical balance between quality and efficiency, with superior results at a reasonable computational cost of 50 seconds per sample.















% We evaluated our MSP-SR framework against state-of-the-art super-resolution methods across three medical imaging datasets: fastMRI, BrainTumor, and OASIS, as shown in Table ~\ref{tab:baseline_result}. The comparison includes both training-free approaches (DPS~\cite{dps}) and learning-based methods (I2SB~\cite{i2sb} and SinSR~\cite{sinsr}). For the learning-based baselines, we utilized their provided pre-trained weights followed by direct fine-tuning on target datasets (fastMRI, OASIS, BrainTumor) for approximately 20,000 iterations. Our approach consistently outperformed existing methods across all datasets and evaluation metrics. For the fastMRI dataset, MSP-SR achieved a PSNR of 28.71dB and SSIM of 0.846, surpassing all baseline methods. The framework demonstrated similar superior performance on BrainTumor and OASIS datasets, achieving PSNR values of 27.34dB and 29.03dB, respectively.




% \paragraph{Ablation Analysis for Domain Transfer.} To evaluate our framework's effectiveness in cross-domain transfer, we conducted ablation studies on four key components: (1) In-Domain (ID) fine-tuning, (2) Out-of-Domain (OOD) pre-training, (3) ControlNet integration, and (4) baseline Target-Domain training. Table \ref{tab:main_exp} demonstrates our multi-stage approach's consistent superiority across all super-resolution scales. OOD pre-training improved PSNR by 19.8\% for 128→256 SR tasks, while multi-stage fine-tuning enhanced PSNR by 9.4\% in challenging 32→256 (8×) SR scenarios. These results underscore our framework's effectiveness in leveraging domain knowledge. Visual comparisons in Fig.~\ref{fig:vis_res} corroborate these findings, demonstrating MSP-SR's superior preservation of intricate brain features, with magnified regions revealing enhanced structural details and sharper edge reconstructions.

% Additionally, we evaluated the impact of consistency loss on model performance. Its incorporation improved PSNR to 29.15dB and SSIM to 0.859 in 4× SR tasks. Visual comparisons in Fig.~\ref{fig:consis} demonstrate the qualitative improvements in image detail and overall quality achieved through consistency loss integration.



\paragraph{Ablation Studies Analysis.} To thoroughly validate the MSP-SR framework's effectiveness and generalizability, we conducted four complementary sets of ablation experiments: (1) Domain Transfer Ablation to evaluate each training stage's contribution in cross-domain scenarios, (2) Cross-dataset Generalization to verify our approach's dataset-agnostic nature, (3) Multi-Model Validation to demonstrate applicability beyond specific models, and (4) Consistency Loss Analysis to assess our consistency regularization technique's impact.


%%%%%%%%%%%%%%main table%%%%%%%%%%%%%%%%5
\begin{table}[H]

\caption{Quantitative comparison over different frameworks on FastMRI dataset, where the bolded values represent the best value in each evaluation metric. The results demonstrate that the MSP-SR framework achieves the majority of the best results across different SR scales.}
\setlength{\tabcolsep}{1pt}
\centering
% \tabcolsep=0.08cm
% \renewcommand{\arraystretch}{1.2} 

\begin{tabular}{lcp{0.7cm}<{\centering}cccccc}
\toprule

& \multicolumn{1}{c}{\textbf{Scale}}
& \multicolumn{4}{c}{\textbf{Training Components}}
& \multicolumn{3}{c}{\textbf{Metrics}}\\

\cmidrule(lr){3-6} \cmidrule(lr){7-9}

& ~
&  OOD
&  ID
&  TD
&  CN$^*$
&  PSNR$\uparrow$%%\multirow{1}{*}{SSIM$\uparrow$}\\
&  SSIM$\uparrow$ 
&  LPIPS$\downarrow$\\

\midrule

  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ & $\checkmark$& \textbf{34.09} & \textbf{0.922} & 0.0762\\
  &   & $\checkmark$ &              & $\checkmark$ & $\checkmark$& 33.03 & 0.913 & \textbf{0.0757} \\
  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ &             & 32.11 & 0.892 & --\\
  & \scriptsize $128 \to 256 $&     & $\checkmark$ & $\checkmark$ &     & 28.45  & 0.771 & 0.101\\
  &   &              &              & $\checkmark$ &             & 5.051  & 0.209 & 0.513\\
  
  
\cmidrule{1-9}
  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ & $\checkmark$& \textbf{28.71} & 0.846 &\textbf{0.1454}\\
  &   & $\checkmark$ &              & $\checkmark$ & $\checkmark$& 28.69 & \textbf{0.847} &0.1455 \\
  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ &             & 27.90 & 0.814 & --\\
  & \scriptsize $64 \to 256 $&      & $\checkmark$ & $\checkmark$ &      & 27.04  & 0.764 & 0.147\\
  &   &              &              & $\checkmark$ &             & 24.31  & 0.686 &  0.173\\
  
  
\cmidrule{1-9}
  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ & $\checkmark$& \textbf{22.98 } & \textbf{0.734} & \textbf{0.216}\\
  &   & $\checkmark$ &              & $\checkmark$ & $\checkmark$& 21.00 & 0.654 & 0.219 \\
  &   & $\checkmark$ & $\checkmark$ & $\checkmark$ &             & 22.43  & 0.669 &--\\
  &\scriptsize  $32 \to 256 $&      & $\checkmark$ & $\checkmark$ &      & 19.00  & 0.558 &0.271\\
  &   &              &              & $\checkmark$ &             & 17.38  & 0.477 &0.284\\


  

\bottomrule
\end{tabular}
\footnotesize{\textit{Note:} $^*$ CN indicates ControlNet fine-tuning applied with ID/TD stages.

\label{tab:main_exp}

% \footnotesize{\textit{Note:} Best results are highlighted in bold. LPIPS uses VGG backbone. }
% \setlength{\belowcaptionskip}{10pt}
\end{table}


\begin{figure*}[ht]
    \centering
    \includegraphics[width=\textwidth]{./sec/fig/report_sr.png}
    \caption{Visualized samples for different frameworks on 4$\times$ scale SR task. We use the COCO dataset for the Out-of-Domain stage, IXI for the In-Domain stage, and FastMRI for the Target-Domain stage. TD refers to train on FastMRI from scratch. Note that our MSP-SR (OOD-ID-TD) framework generates samples with clearer and more accurate structural details.}
    \label{fig:vis_res}
\end{figure*}


We first examined domain transfer capabilities through systematic ablation on four key components: In-Domain (ID) fine-tuning, Out-of-Domain (OOD) pre-training, ControlNet integration, and baseline Target-Domain training. Table \ref{tab:main_exp} demonstrates our multi-stage approach's consistent superiority across all super-resolution scales. OOD pre-training improved PSNR by 19.8\% for 128→256 SR tasks, while multi-stage fine-tuning enhanced PSNR by 9.4\% in challenging 32→256 (8×) SR scenarios, underscoring our framework's effectiveness in leveraging domain knowledge. Visual comparisons in Fig.~\ref{fig:vis_res} corroborate these findings, demonstrating MSP-SR's superior preservation of intricate brain features with enhanced structural details and sharper edge reconstructions.


To verify generalization across different test datasets, we evaluated our approach on BrainTumor~\citep{braintumor} and OASIS~\citep{oasis} datasets for 4× super-resolution. Results in Table~\ref{table:other_dataset} show meaningful contributions from each training stage, with BrainTumor dataset showing pronounced improvements: Out-of-Domain pre-training improved PSNR by 3.13\%, In-Domain fine-tuning enhanced PSNR by 5.68\%, and ControlNet integration further increased PSNR by 1.75\%.



Beyond dataset generalization, we investigated whether our training strategy benefits different models. We conducted experiments using SinSR~\citep{sinsr} as the backbone to demonstrate broad applicability across architectures. As shown in Table~\ref{table:sinsr_abla}, our training strategy consistently improved SinSR performance with gains of 1.04 dB in PSNR, confirming effective generalization across different model architectures.

Finally, we analyzed the consistency loss component's specific contribution. Its incorporation improved PSNR to 29.15dB and SSIM to 0.859 in 4× SR tasks, with visual comparisons in Fig.~\ref{fig:consis} demonstrating qualitative improvements in image detail and overall quality.






\begin{table}[htbp]

\caption{Quantitative comparison on other MRI datasets in 4$\times$ scale SR, where the bolded values represent the best value in each evaluation metric. The results demonstrate that the MSP-SR framework achieves the best results across different datasets.}
\setlength{\tabcolsep}{2pt} % Adjusted tabcolsep for closer columns
% \renewcommand{\arraystretch}{0.9}
\centering
% \tabcolsep=0.08cm

\begin{tabular}{lcp{0.5cm}<{\centering}cccccccc}
\toprule

& \multicolumn{4}{c}{\textbf{Training component}}
& \multicolumn{2}{c}{\textbf{BrainTumor}}
& \multicolumn{2}{c}{\textbf{OASIS}} \\

\cmidrule(lr){1-5}\cmidrule(lr){6-7} \cmidrule(lr){8-9}

& OOD
& ID
& TD$^*$
& CN$^*$
& PSNR$\uparrow$
& SSIM$\downarrow$
&  PSNR$\uparrow$%%\multirow{1}{*}{SSIM$\uparrow$}\\
& SSIM$\downarrow$ \\

\midrule

& $\checkmark$ & $\checkmark$ & $\checkmark$ & $\checkmark$ & \textbf{27.34} & \textbf{0.811} &  \textbf{29.03}  & \textbf{0.872} \\
  & $\checkmark$ &  & $\checkmark$ & $\checkmark$   &  25.87 & 0.783 & 28.75 & 0.852\\
  & $\checkmark$ & $\checkmark$ & $\checkmark$ &  & 26.87  &0.783  & 29.02   & 0.857 \\
  &  & $\checkmark$ & $\checkmark$ &  &  26.51 & 0.800 & 28.52 
 &  0.760\\
 
 

\bottomrule
\end{tabular}

\footnotesize{\textit{Note:} $^*$ CN indicates ControlNet fine-tuning applied with ID/TD stages. TD refers to the TD fine-tuning stage where we set the notumor brain and OASIS as the target dataset here.}

\label{table:other_dataset}
\setlength{\belowcaptionskip}{10pt}
\end{table}




\begin{table}[htbp]

\caption{Quantitative comparison for SinSR over different frameworks in 4$\times$ scale SR, where the bolded values represent the best value in each evaluation metric. }
\setlength{\tabcolsep}{2pt} % Adjusted tabcolsep for closer columns
% \renewcommand{\arraystretch}{0.9}
\centering
% \tabcolsep=0.08cm
\begin{tabular}{lcp{2.5cm}<{\centering}ccccccc}
\toprule

& \multicolumn{3}{c}{\textbf{Training component}}
& \multicolumn{3}{c}{\textbf{FastMRI}} \\

\cmidrule(lr){1-4}\cmidrule(lr){5-7} 

& OOD
& ID
& TD
& PSNR$\uparrow$
& SSIM$\downarrow$
& LPIPS$\uparrow$ \\

\midrule

& $\checkmark$ & $\checkmark$ & $\checkmark$ &   \textbf{28.02} & \textbf{0.8550} &  \textbf{0.1748}  \\
 & $\checkmark$ &  & $\checkmark$ &    26.98	& 0.8430	 &  0.2745   \\
 &  & $\checkmark$ & $\checkmark$ & 	   25.14 & 0.7866	&  0.2111   \\
  & &  & $\checkmark$ &      22.54		&  0.6451& 0.2177   \\
 
 
 
\bottomrule
\end{tabular}
\label{table:sinsr_abla}
\setlength{\belowcaptionskip}{10pt}
\end{table}







\begin{table}[htbp]
    \centering
    \caption{Negative Log Likelihood (NLL) Comparison Across Training Configurations and SR Scales}
    \label{tab:loglikelihood}
 
    \setlength{\tabcolsep}{1.3pt} % 控制列间距
    % \renewcommand{\arraystretch}{0.9} % 控制行高
    \begin{tabular}{l c c c}
        \toprule
        \multirow{2}{*}{\textbf{Method}} & 
        \multicolumn{3}{c}{\textbf{Negative Log Likelihood$\downarrow$}} \\
        \cmidrule(lr){2-4}
        & \textbf{2$\times$ SR} & \textbf{4$\times$ SR} & \textbf{8$\times$ SR} \\
        \midrule
        \textbf{Training Stages (Fig.~\ref{fig:heatmap2})} \\
        MSP-SR Framework (Fig.4a)         & -  & 17.177   & - \\
        OOD + TD Stages (Fig.4b) & -  & 18.681   & - \\
        Only TD Stages (Fig.4c)        & -      & 26.509       & - \\
        \midrule
        \textbf{SR Scale (Fig.~\ref{fig:heatmap1})} \\
        MSP-SR Framework (Fig.5a)         & 12.642  & -       & - \\
        MSP-SR Framework (Fig.5b)         & -      & 17.177  & - \\
        MSP-SR Framework (Fig.5c)         & -      & -       &  17.147 \\
        \bottomrule
    \end{tabular}
    
  
    \footnotesize{\textit{Note:} "-" indicates non-applicable due to scale-specific training.}
\end{table}





% \abcomment{Say for what target dataset. Mention the main take-away from the results.}








\begin{figure}[t]
    \centering
    \includegraphics[width=1.0\columnwidth]{./sec/fig/ablation_heatmap.png}
    \caption{Uncertainty Analysis Across Training Configurations. Pixel value distributions shown for: (a) MSP-SR framework, (b) OOD pre-training with TD fine-tuning, and (c) TD-only training, demonstrating a more accurate uncertainty characterization through progressive domain transfer.}
    \label{fig:heatmap2}
\end{figure}



\begin{figure}[t]
    \centering
    \includegraphics[width=1.0\columnwidth]{./sec/fig/248x_heatmap.png}
    \caption{Scale-dependent Uncertainty Analysis in Super-resolution. Probabilistic distributions at 2×, 4×, and 8× scales illustrate increasing reconstruction uncertainty at higher resolutions.}
    \label{fig:heatmap1}
\end{figure}











\paragraph{Uncertainty Analysis.}

To assess our multi-stage training framework's uncertainty characterization, we conducted comprehensive uncertainty quantification experiments across different training configurations and super-resolution scales. Specifically, we compared uncertainty calibration across three training paradigms ( MSP-SR (OOD+ID+TD), OOD+TD, and TD-only) and evaluated scale-dependent uncertainty patterns from 2× to 8× super-resolution using negative log-likelihood metrics and probabilistic distribution analysis.


\begin{figure}[H]
    \centering
    \includegraphics[width=\columnwidth]{./sec/fig/consis.png}
    \caption{Visualized sample to verify the influence of consistency loss in 4$\times$ scale SR task. The consistency loss significantly improves reconstruction quality and detail preservation.}
    \label{fig:consis}
\end{figure}





Table ~\ref{tab:loglikelihood} presents quantitative evaluation using the negative log-likelihood (NLL) of pixel intensity distributions. For each low-resolution input, we generated 10 high-resolution predictions and constructed pixel-wise probability distributions using 256 intensity bins between 0 and 1. Mathematically, for a pixel location $(i,j)$ with true intensity value $y_{i,j}$ and estimated probability distribution $\hat{p}_{i,j}$, the pixel-wise NLL is:
$$\text{NLL}_{i,j} = -\log(\hat{p}_{i,j}(y_{i,j}))$$

Here, $\hat{p}_{i,j}$ is derived from our generative model's samples without prior knowledge of $y_{i,j}$. The final NLL averages over all pixel locations:
$$\text{NLL} = \frac{1}{H \times W}\sum_{i=1}^{H}\sum_{j=1}^{W} \text{NLL}_{i,j}$$

Our MSP-SR framework achieved lower NLL (17.177) compared to both Out-of-Domain pre-training with target-domain fine-tuning and target-domain-only training, demonstrating more accurate uncertainty characterization across scales.

We visualized pixel-wise uncertainty through probability distribution heatmaps (Fig.~\ref{fig:heatmap2}, \ref{fig:heatmap1}) for a 30-pixel segment marked by a yellow line in the MRI image, with color intensity representing probability density and yellow markers indicating ground truth values.

Comparative analysis of uncertainty characteristics (Fig.~\ref{fig:heatmap2}) reveals our MSP-SR framework's superior calibration properties. Our approach demonstrates well-calibrated predictive uncertainty where predicted distributions closely follow ground truth distributions across pixel intensities, indicating accurate uncertainty estimation without systematic bias. The OOD+TD approach shows improved alignment compared to baseline but still exhibits distribution misalignment in high-intensity regions. In contrast, TD-only training exhibits severe uncertainty pathologies with distributions concentrated around incorrect values, demonstrating uncertainty collapse with false confidence in erroneous predictions.

Analysis of increasing super-resolution scales (2× to 8×) reveals progressively wider pixel value distributions (Fig.~\ref{fig:heatmap1}), indicating heightened reconstruction uncertainty at higher scales. At 2× SR, predicted distributions maintain tight alignment with ground truth values, while 4× SR shows moderate broadening in mid-intensity ranges corresponding to gray matter regions. At 8× SR, distributions become significantly dispersed in intermediate intensity values where tissue boundaries reside, while maintaining better confidence in extreme intensities. This scale-dependent uncertainty pattern is confirmed by variance maps (Fig.~\ref{fig:variance_map}), showing elevated uncertainty in complex brain regions such as cortical folds.

\begin{figure}[H]
    \centering
    \includegraphics[width=1\columnwidth]{./sec/fig/variance_maps.png}
    \caption{Variance maps generated from multiple inference outputs of same diffusion model applied to 2$\times$,4$\times$, and 8$\times$ super-resolution tasks. Standard deviation is normalized by value to indicate the scale of variance relative to actual intensities. The variance is high in areas where the brain's features are more distinct, such as the folds and ridges of the cortex. 
    }

    \label{fig:variance_map}
\end{figure}


% Comparative analysis of uncertainty characteristics (Fig.~\ref{fig:heatmap2}) reveals our MSP-SR framework's advantages: predicted distributions closely align with ground truth values while maintaining well-calibrated uncertainty without intensity-specific overconfidence. In contrast, direct target-domain fine-tuning shows distribution misalignment and misplaced confidence, while target-domain-only training exhibits complete uncertainty collapse with distributions concentrated around incorrect values.

% Analysis of increasing super-resolution scales (2× to 8×) reveals progressively wider pixel value distributions (Fig.\ref{fig:heatmap1}), indicating heightened reconstruction uncertainty at higher scales. This pattern is confirmed by variance maps (Fig.\ref{fig:variance_map}), which show elevated uncertainty in complex brain regions, reflecting the increased difficulty in detailed structure reconstruction.














%%%%%%%%%%%%%% table%%%%%%%%%%%%%%%%5






% \textcolor{red}{experiment for multi?}







% \abcomment{font size on top of images are too small. What is the main take-away from these results?}










