
\section{Related Work}

\paragraph{Data-Efficient Super-Resolution.}

% single-image super-resolution (SISR) is an important task in the vision area and has achieved substantial progress in the realm of natural images, with vast effective models trained using large amounts of natural image data~\citep{zhang2017deep, dong2014learning, ledig2017photo, lu2022transformer}. 
Data-efficient super-resolution has been extensively discussed in medical imaging literature. \cite{greenspan2009super} presents an overview of early methods that required multiple low-resolution images to resolve high-resolution outputs, contrasting with modern learned approaches~\citep{li2021review} including our own. Early techniques employed iterative back-propagation for consistency checking, while newer diffusion techniques~\citep{song2023pseudoinverseguided} allow learning-based SR methods to avoid explicit down-sampling modeling.

\cite{li2021review} discusses recent deep learning-based super-resolution techniques for medical imaging, where acquiring high-resolution images remains a major bottleneck. Approaches include recursive neural networks to limit parameters~\citep{kim2016deeplyrecursiveconvolutionalnetworkimage,8099781}, GANs for training on small datasets~\citep{wang2024exploiting,mansoor2018adversarial,ensemblect} despite training stability concerns, and smaller deep models like U-nets~\citep{park2018computed} with additional regularization. Notably, \citep{ensemblect} draws inspiration from~\citep{cyclegan} to improve consistency and reduce hallucination.

%Another point discussed in early medical SR literature is the direct modeling of the dynamics that served to down-sample low-resolution images in the first place, which varies across different medical imaging tasks. 


%% Satellite

% The satellite image SR domain shares similar challenges to its medical counterpart, featuring rare, expensive high-resolution data~\citep{9757881}. As in the medical imaging domain, interpolation through non-learned mathematical operations is considered, but its performance is limited by its lack of a model of the domain, leading researchers towards deep learning, and likewise towards the challenge of data-efficiency. ~\citep{cornebise2022openhighresolutionsatelliteimagery} proposes a single open, large-scale satellite image dataset aimed alleviating the latter issue. In doing so, they emphasize the difficulty of creating a representative dataset capable of being applied to more-specific downstream tasks. While a well-designed foundational dataset significantly increases the domain's availability, many use-cases will nonetheless wish to construct their own, more-targeted datasets.

%~\citep{Lu2019SatelliteIS} discusses a multi-scale approach to satellite super-resolution, which shares qualitative design choices with our own approach. The authors employ multiple receptive field sizes in their model, on the shared basis that global and contextual information is necessary to produce the most plausible counterpart to a low-resolution image. While their approach focuses on architecture, employing a set of three sub-networks operating on different image scales and merging their outputs, our approach focuses on training, using lower-quality down-sampled images to provide a broad basis for larger-scale features and the nature of the domain, and using multi-scale training to ensure that the final network is robust to different scales of input image.

%~\citep{Shermeyer_2019_CVPR_Workshops} examines the downstream potential of satellite super-resolution on object detection tasks, allowing for a quantitative assessment of the practical usefulness of super-resolution models on the type of domain our methodology aims for. The authors note a meaningful improvement in object detection performance, and underscore the value of being able to use affordable low-quality images in place of high-quality images in downstream tasks. In applying the same principle to model training, we aim to make these benefits more readily available. Further, their results show a clear increase in performance with the use of deeper and more advanced models, commensurate with their improvements under more conventional metrics. We hypothesize that the gains made by our diffusion-based approach should translate to further increases in downstream task performance.


% On the whole, data paucity and the difficulty of incorporating priors are generally regarded as the core challenges in medical and satellite SR. Existing work has tended to rely upon conventional data augmentation techniques, alongside more novel techniques, including the use of neural network embeddings to conduct searches for supplementary images across large natural image datasets~\citep{zhang2017deep}. Our usage of low-resolution data to augment the training process alleviates this issue, as all images used in this stage can be on-task, and the distribution of these images can better capture the ground truth distribution than a set of images all adapted from the same small high-resolution dataset. While the incorporation of specific degradation priors is not the main focus of our work, it is worth noting that the use of a ControlNet simplifies this task substantially, given that there are no assumptions made on the conditioning image provided. A guidance image can be an image segmentation, edge detection output, or pose, for instance, and the relation between such the conditioning and the output image is learned by the network itself~\citep{controlnet}.


\paragraph{Diffusion-Based Super-Resolution.}

%% i2sb~\citep{i2sb}, Palette~\citep{palette}, SR3~\citep{SR3}, DDBM~\citep{xue2022ddrm}
 
% The advent of diffusion models created numerous opportunities for advancement in super-resolution techniques. Palette~\citep{palette} provides an early demonstration of diffusion models on image restoration tasks. DPS~\citep{dps} extends traditional diffusion models to permit the solving of non-linear inverse problems, making a wider range of practical corruptions that may occur in real-world use-cases reversible. I$^2$SB ~\citep{i2sb} improves upon this by leveraging an extension to score-based models that facilitates the learning of a function which maps between two distributions of images. 


The advent of diffusion models has created significant advances in super-resolution techniques. Palette~\citep{palette} and SR3~\citep{SR3} pioneered the application of diffusion models to image restoration tasks. Training-free methods like DPS~\citep{dps} extend traditional diffusion models to solve non-linear inverse problems, enabling restoration across diverse real-world corruptions without requiring training. Meanwhile, learning-based methods such as I$^2$SB~\citep{i2sb} and SinSR~\citep{sinsr} achieve improved performance through model training, where I$^2$SB develops a score-based framework for direct distribution mapping, and SinSR enables efficient single-step inference through learned diffusion.




%From this, we see significant potential for the use of diffusion in medical and satellite image enhancement, motivating a solution to the need for large quantities of on-task, high-resolution training data.

%, providing performance that exceeds that of standard diffusion and allowing for image recovery without a model of how corruption is applied


 % In~\citep{wang2024exploiting}, a ControlNet is used to harness the priors of an image synthesis model, achieving higher performance with the same data. The authors utilize a pre-trained world model, avoiding the need to train from scratch. We leverage a similar approach, though our ControlNet's training is carried out in two separate stages. First, low-resolution data is further downsampled, and these image pairs are used to train the ControlNet, acclimating it to the target domain. Second, a more limited number of genuine low-high pairs is used to fine-tune the ControlNet. This extension of the process allows for further improvements to data-efficiency, in cases where high quality data is especially difficult to acquire.


% ControlNet~\citep{controlnet} is a neural network architecture that connects a secondary network, which takes in an image providing information about the desired output, to a frozen diffusion model through the use of zero-initialized convolutional layers. This allows a pre-trained diffusion model's priors to be leveraged while simultaneously learning a robust process for aligning output images to the conditioning. This architecture has natural applications in the super-resolution domain, though our work is unique in specifically targeting use-cases in which data-efficiency is a core concern.

%~\citep{wang2024exploiting} makes use of a ControlNet in order to leverage the prior of an image synthesis model and achieve higher performance than otherwise possible with the same amount of data. By incorporating a world model that has been trained using significant general data and resources, the authors are able to forego the need to train an image synthesis model from scratch. We leverage a similar approach, though our ControlNet's training is carried out in two separate stages. First, low-resolution data is further downsampled, and these image pairs are used to train the ControlNet, acclimating it to the target domain. Second, a more limited number of genuine low-high pairs is used to fine-tune the ControlNet. This extension of the process allows for further improvements to data-efficiency, in cases where high quality data is especially difficult to acquire.



\paragraph{Medical Image Quality Assessment.} While traditional image quality metrics like PSNR and SSIM provide quantitative measures for super-resolution performance, they may not fully capture anatomical accuracy or clinical utility in medical imaging applications. Recent research has highlighted specialized evaluation methodologies: Zhang et al.~\cite{zhang2021impact} proposed task-driven assessment using numerical observers for diagnostic tasks, Kelkar et al.~\cite{kelkar2022assessing} introduced medical image-specific statistical divergence metrics to detect anatomical hallucinations, and Li et al.~\cite{li2021assessing} developed methods to analyze covariance structures for identifying unintended smoothing of anatomical textures. These specialized approaches complement traditional metrics by providing deeper insights into clinically relevant feature preservation. While our current work utilizes established metrics for comparison with prior methods, incorporating these specialized medical imaging evaluation protocols represents an important direction for future research to ensure anatomical fidelity in super-resolution enhancement.