
\section{Introduction}


%%such as satellite imagery~\citep{Shermeyer_2019_CVPR_Workshops, Lu2019SatelliteIS,cornebise2022openhighresolutionsatelliteimagery}, electron microscopy~\citep{prakash2022super,de2019resolution}, and

Several application domains, especially in science and medicine, e.g., MRI (magnetic resonance imaging) or CT (Computed Tomography)~\citep{greenspan2009super, umehara2018application,zhang2017deep}, benefit tremendously from acquiring high-resolution images of objects and phenomena of interest. Recognizing this need, generative models for super-resolution (SR), especially diffusion models~\citep{SR3, i2sb, shang2024resdiff},  have emerged as a promising approach for such data generation. 


Unlike conventional CNN-~\citep{dong2015image}, transformer-~\citep{lu2022transformer}, and GAN-based~\citep{realesrgan, DASR, BSRGAN} methods, diffusion models excel at handling the fundamentally ill-posed nature of super-resolution problems~\citep{Kabanikhin+2008+317+357}. Their generative formulation provides inherent uncertainty awareness and enhanced capacity to model complex data distributions~\citep{wang2020deep}, making them particularly suited for medical imaging. 
%where stochastic variations and ambiguous structures are prevalent.



However, practical implementation faces significant hurdles in scientific domains. Unlike natural images, medical high-resolution data acquisition demands specialized equipment with substantial time and financial investments, e.g., MRI scanners require advanced hardware and prolonged scan durations, creating patient discomfort and institutional burdens. This scarcity creates a critical paradox: while medical images contain exceptionally complex anatomical structures requiring millimeter-level reconstruction accuracy, data-starved models often develop harmful over-reliance on limited priors. Conventional direct fine-tuning (DFT) approaches in such few-shot scenarios typically produce inaccurately confident predictions, a perilous outcome where hallucinated anatomical details could lead to clinical misdiagnosis. 


Towards resolving this challenge, we propose \textbf{M}ulti-\textbf{S}tage \textbf{P}robabilistic \textbf{S}uper-\textbf{R}esolution (MSP-SR), a cascaded few-shot medical image super-resolution framework based on generative models. To mitigate the constraints imposed by scarce medical data, we develop a multi-stage learning framework that enables the model to pre-train and extract visual features from abundant natural image domains, subsequently transferring and adapting these learned representations to the medical imaging context. Utilizing SR3 (Super-Resolution via Repeated Refinement)~\citep{SR3} as our foundational architecture and employing ControlNet~\citep{controlnet} to facilitate nuanced transfer learning, we introduce additional training constraints through innovative loss penalties. These mechanism ensure more accurate data generation by maintaining appropriate uncertainty levels, particularly for ambiguous regions where limited training data provides insufficient reconstruction guidance. 


% Experiment results show our framework to achieve superior super-resolution performance, even when confronted with the challenging few-shot medical imaging scenarios.




%%%%%%%%TODO%%%%%%
Our framework implements a three-stage training methodology that leverages diverse datasets to progressively enhance model specialization. The initial Out-of-Domain (OOD) pre-training stage utilizes SR3~\citep{SR3} with the low-resolution COCO dataset~\citep{COCO}, extracting generalized visual features from diverse natural images. The subsequent In-Domain (ID) stage adapts the model to medical imaging characteristics using low-resolution T2-weighted scans from the IXI dataset~\citep{IXI}, which contains MR images from healthy subjects. The final Target-Domain (TD) stage fine-tunes the model on specific low-resolution-high-resolution (LR-HR) pairs from T2-weighted FastMRI~\citep{fastmri} and BrainTumor~\citep{braintumor} datasets, along with T1-weighted OASIS~\citep{oasis} scans. This dataset progression enables precise brain MRI super-resolution while maintaining cross-modality generalization through controlled domain transfer.


Ablation studies validated our framework's effectiveness by examining variants without ControlNet, Out-of-Domain pre-training, In-Domain fine-tuning, and using target data training alone. Quantitative analysis demonstrated our full pipeline's superior performance in detail preservation and training stability during the TD stage. The cascaded transfer learning framework facilitates effective knowledge transfer both across domains (Out-of-Domai to In-Domain) and within domains (In-Domain to Target-Domain), providing a more accurate characterization of
uncertainty and improving feature extraction in few-shot scenarios.

% \abcomment{this para is ok, but seems a bit rushed, without clearly calling out the specific ablations, and the results + their implications. Please expand the section, discuss the results in more details, including the nature of improvements/degradations due to ablations}


% \abcomment{Provide a summary of the ablation studies done on the three stages as this aspect makes the work+results strong. Also discuss ablation on the fine-tuning approach, using basic vs.~control-net based fine-tuning. Maybe also mention cyclic consistency.} \rkcomment{revised}


% \abcomment{last sentence is too generic, maybe use 2-3 sentences to paint an accurate picture of the results, e.g., (1) the multi-stage pipeline almost always does better than ablation, including dropping one/two stages; (2) control-net helps; (3) consistency based loss helps; (4) story stays the same for multiple SR levels (2x, 4x, 8x); (5) story stays the same for different target datasets with scarce HR data, etc.}.\rkcomment{revised}

In summary, this paper’s contributions include:
% \abcomment{not sure what the following is ... its an even more brief version of the above, not sure how that helps anyone}
% \abcomment{lets see how the above looks, then the bellow can be polished/dropped}

\begin{enumerate}
    \item \textbf{Novel Multi-stage Learning Framework.} We propose a novel cascaded learning framework that achieves high generation accuracy compared with other SR models under few-shot conditions. 
    \item \textbf{Cross-domain Knowledge Transfer.} Our framework effectively transfers knowledge from natural image domains to medical imaging, enabling robust performance despite limited medical training data.
    \item \textbf{Uncertainty-aware Generation.} Our framework reduces dependency on limited training data, providing a precise characterization of uncertainty and ensuring more accurate data generation.
    % \item \textbf{Uncertainty-aware Generation.} Our approach reduces dependency on limited training data, providing a precise characterization of uncertainty and ensuring more accurate data generation.
    
    
    % \item \textbf{Improved performance in data-scarce environments.} Our multi-stage training strategy enables the model to better capture the true data distribution in scenarios with limited high-resolution data, leading to enhanced performance.
    % \item \textbf{Improve model consistency.} Incorporating the consistency loss enhances the model's ability to generate outputs that are more consistent with the target-domain data.
    % \item \textbf{Stabilize the model convergence.} The MSP-SR framework ensures more stable convergence during model training on target-domain data.
    
\end{enumerate}

% \abcomment{discuss how the rest of the paper is organized}