\documentclass{midl}
\usepackage{float} 
\usepackage{booktabs}
\usepackage{multirow}
\usepackage{enumitem}
\jmlrvolume{-- nnn}
\jmlryear{2026}
\jmlrworkshop{Full Paper -- MIDL 2026}
\editors{Accepted for publication at MIDL 2026}

\title[Hybrid Medical Image Enhancement]{Bridging Classical and Learned Priors: A Hybrid Framework for Medical Image Enhancement}

 \midlauthor{\Name{Peeyush Kumar Singh} \Email{s24008@students.iitmandi.ac.in}\\
  \Name{Sneha Singh} \Email{sneha@iitmandi.ac.in}\\
  \addr Indian Institute of Technology Mandi, India}

\begin{document}

\maketitle

\begin{abstract}
Medical image enhancement faces a fundamental trade-off: classical methods preserve anatomical fidelity but over-smooth fine structures, while deep learning approaches risk generating unrealistic artifacts on limited clinical data. We introduce a hybrid framework combining classical preprocessing with pretrained diffusion priors for high-quality enhancement across modalities.
Our method leverages pretrained Stable Diffusion model without requiring domain specific training. During inference, classical enhancement methods generate pseudo-labels. The frozen diffusion model leverages its learned priors to refine fine structures while gradient-based guidance anchors generation to the pseudo-label, preventing hallucinations. We demonstrate efficacy in ultrasound and MRI segmentation and achieve significant improvements in multi-class cardiac structure segmentation compared to baseline models. Critical insights include: pseudo-labels outperform multi-stage classical pipelines by providing differentiable guidance targets for diffusion models, testing segmentation models on enhanced images yields additional performance gains, pseudo-label guidance strength requires domain specific tuning to balance classical robustness with learned refinement.
With extensive evaluation across imaging modalities, we show that pretrained diffusion models can enhance medical images while preserving the interpretability and diagnostic fidelity essential for clinical deployment.
\end{abstract}

\begin{keywords}
Medical Image Enhancement, Diffusion models, Prior-guided generation, Ultrasound, MRI, Synthesis
\end{keywords}

\section{Introduction}

Medical image enhancement is critical for improving diagnostic accuracy and enabling reliable automated analysis. However, enhancement methods face a fundamental challenge: they must improve image quality while preserving anatomical fidelity and avoiding the introduction of misleading artifacts that could compromise clinical decision-making. This is very challenging in medical imaging modalities such as ultrasound and MRI, where the presence of inherent noise characteristics such as speckle in ultrasound and Rician in MRI, may have an impact on diagnostic quality. Classical enhancement approaches have dominated medical imaging due to their interpretability. Speckle Reducing Anisotropic Diffusion (SRAD) \cite{yu2002speckle} for ultrasound uses diffusion-based denoising for a multiplicative noise model, and Contrast Limited Adaptive Histogram Equalization (CLAHE) \cite{zuiderveld1994contrast} enhances local contrast. Adaptive histogram equalization \cite{sm1977adaptive} approaches for MRI improve tissue contrast while trying to keep brightness relationships that are important for radiological interpretation.
Nonetheless, these methodologies are inherently constrained by their dependence on handcrafted priors: they often excessively smooth intricate anatomical structures, encounter difficulties with significant degradation, and fail to adapt to varied clinical contexts without comprehensive parameter adjustment. Deep learning techniques have surfaced as formidable alternatives, with convolutional neural networks \cite{zhang2017beyond, ker2017deep}, generative adversarial networks \cite{you2019ct, armanious2020medgan}, and more recently, diffusion models \cite{ho2020denoising, saharia2022image} showcasing remarkable enhancement capabilities. Diffusion models have demonstrated exceptional efficacy in image restoration by effectively learning intricate data distributions \cite{song2020score}. 

Recent works have adapted diffusion models for medical image enhancement \cite{song2021solving, wolleb2022diffusion}, including approaches that train domain specific models on paired or unpaired data. However, these supervised methods face critical limitations in medical imaging: the scarcity of ground-truth clean images, the risk of generating realistic-looking but anatomically incorrect hallucinations when training data is limited, and the computational cost of training separate models for each modality. These challenges have hindered the clinical translation of diffusion-based enhancement despite their technical promise. 

We introduce a hybrid framework that synergistically combines classical methods with the refinement capabilities of pretrained diffusion models to address the challenge of enhancing degraded medical images without domain-specific training while preventing anatomical hallucinations which is a critical requirement for clinical adoption.
Our approach uses classical preprocessing to generate pseudo-labels that serve as differentiable guidance targets during diffusion sampling. Inspired by underwater image enhancement diffusion priors \cite{du2025uiedp}, we apply gradient-based constraints so the frozen Stable Diffusion model \cite{rombach2022high} refines structures beyond classical limits while remaining anchored to targets, preventing hallucinations. We validate our framework on ultrasound cardiac segmentation using the CAMUS dataset \cite{leclerc2019deep} and MRI cardiac segmentation using ACDC dataset \cite{bernard2018deep}, demonstrating that pretrained diffusion models can enhance medical images when properly constrained by domain-appropriate classical priors.\\
\textbf{Our main contributions are:}
\begin{itemize}[noitemsep, topsep=2pt, leftmargin=*]
    \item A training-free framework combining classical preprocessing with pretrained diffusion models.
    \item A gradient-based guidance mechanism constraining generation to anatomically plausible solutions.
    \item Extensive validation across modalities and five segmentation architectures.
    \item Demonstration that pretrained natural image models can enhance medical images when properly constrained by classical priors.
\end{itemize}



\begin{figure}[!ht]
\floatconts
  {fig:grid}
  {\caption{Representative samples from evaluation datasets. 
CAMUS (columns 1-2): 2D echocardiography with left ventricle (red), 
myocardium (green), and left atrium (blue) annotations. 
ACDC (columns 3-4): Cardiac MRI with right ventricle (blue), 
myocardium (green), and left ventricle (red) annotations.}}
  {\includegraphics[width=0.8\linewidth]{grid.pdf}}
\end{figure}

\section{Methodology}

\subsection{Framework Overview}

Our hybrid enhancement framework synergistically combines classical preprocessing with pretrained diffusion models through gradient-guided sampling, eliminating the need for domain-specific training while ensuring anatomical fidelity. The framework operates through a two-stage inference pipeline: (1) \textbf{Classical Enhancement Stage} applies modality-specific preprocessing to generate pseudo-labels that serve as guidance targets, and (2) \textbf{Diffusion Refinement Stage} leverages the frozen Stable Diffusion model to refine structural details through gradient-guided reverse diffusion, remaining anchored to the pseudo-label constraints. By not allowing unconstrained sampling, we prevent the diffusion model from hallucinating anatomically implausible structures which is a paramount concern in clinical deployment. Figure~\ref{fig:grid} shows a few samples from the CAMUS and ACDC dataset along with annotations, while Figure~\ref{fig:arch} shows the framework.

The key innovation lies in treating classical methods not as competing alternatives but as complementary guidance. Classical methods excel at removing modality-specific artifacts (speckle in ultrasound, Rician noise in MRI) and provide reliable anatomical structure, but over-smooth fine details due to hand-crafted priors. Conversely, pretrained diffusion models have learned rich natural image priors from large-scale datasets, enabling detail refinement, but lack medical domain specificity and risk generating unrealistic structures when unconstrained. Our gradient guidance mechanism bridges this gap: the diffusion model refines structures within a constrained solution space defined by classical preprocessing. Specifically, pretrained Stable Diffusion is selected because low-level features 
(edges, textures, gradients) learned from large-scale natural image datasets 
generalize across domains, and our gradient-based constraint 
explicitly anchors generation to the medical domain through classical preprocessing, 
effectively bridging the domain gap without requiring medical-specific pretraining.

\subsection{Classical Enhancement: Modality-Specific Pseudo-Label Generation}

\subsubsection{Ultrasound Enhancement (SRAD $\rightarrow$ CLAHE)}
Ultrasound suffers from multiplicative speckle and low contrast. We generate pseudo-labels via SRAD followed by CLAHE.\\
\\
\textbf{SRAD}: Speckle-adapted anisotropic diffusion \cite{yu2002speckle} evolves
\begin{equation}
\frac{\partial I}{\partial t}=\nabla\!\cdot\!\left[c(q)\nabla I\right],\quad
c(q)=\frac{1}{1+\frac{q^2-q_0^2}{q_0^2(1+q_0^2)}},
\end{equation}
with coefficient of variation $q$ and $q_0=50$. We run $T=25$ explicit steps with $\Delta t=0.02$.\\
\\
\textbf{CLAHE}: Local contrast is restored via tile-wise histogram equalization with clip limit 2.0 \cite{zuiderveld1994contrast}.  
\begin{equation}
\mathbf{y}^{\mathrm{US}}_{\mathrm{pseudo}}
=\mathrm{CLAHE}(\mathrm{SRAD}(I_0)).
\end{equation}

\subsubsection{MRI Enhancement (N4ITK $\rightarrow$ NLM $\rightarrow$ CLAHE)}
MRI degradation arises from bias-field inhomogeneity and Rician noise. We apply N4ITK, NLM, and CLAHE.\\
\\
\textbf{N4ITK}: MRI is modeled as $I_0=b\,u+n$. N4ITK \cite{tustison2010n4itk} iteratively estimates
\begin{equation}
    I_{\text{N4}}=\frac{I_0}{\hat{b}},
\end{equation}

where $\hat{b}$ is a B-spline–parameterized bias field fitted in the log domain (shrink factor 4).\\
\\
\textbf{NLM}: Magnitude MRI yields Rician noise; denoising uses non-local similarity \cite{buades2011non}:
\begin{equation}
I_{\text{NLM}}(\mathbf{x})
=\frac{\sum_{\mathbf{y}} w(\mathbf{x},\mathbf{y})\,I_{\text{N4}}(\mathbf{y})}
       {\sum_{\mathbf{y}} w(\mathbf{x},\mathbf{y})},
\quad
w=\exp\!\left(-\frac{\|P_x-P_y\|^2}{h^2}\right),
\end{equation}
with noise-adapted bandwidth \cite{coupe2008optimized}
\begin{equation}
h(\mathbf{x})=\beta\sigma\sqrt{\max(1,I_{\text{N4}}(\mathbf{x})/\sigma)},\;\beta=1.2.
\end{equation}\\
\\
\textbf{CLAHE}: Final local contrast enhancement is applied to sharpen cardiac boundaries:  
\begin{equation}
\mathbf{y}^{\mathrm{MRI}}_{\mathrm{pseudo}}
=\mathrm{CLAHE}(\mathrm{NLM}(\mathrm{N4}(I_0))).
\end{equation}

Both pipelines produce anatomically consistent pseudo-labels used as soft targets for diffusion-guided refinement.

\begin{figure}[!ht]
\floatconts
  {fig:arch}
  {\caption{Two-phase hybrid enhancement framework combining classical priors 
with pretrained diffusion models. 
Classical preprocessing generates pseudo-labels through 
modality-specific pipelines. These pseudo-labels provide differentiable 
guidance targets during reverse diffusion sampling in latent space 
($256 \times 256 \to 32 \times 32 \times 4$ compression). 
Gradient-based guidance at each denoising step constrains the pretrained 
Stable Diffusion model toward pseudo-labels, preventing hallucinations 
while enabling learned refinement of fine structures. 
The framework requires no domain-specific training, only inference-time 
sampling with frozen pretrained components (VAE encoder/decoder, UNet)}}
  {\includegraphics[width=\linewidth]{arch.pdf}}
\end{figure}

\subsection{Gradient-Guided Diffusion Sampling}

\subsubsection{Latent Encoding via Stable Diffusion VAE}

To enable efficient diffusion, we operate in Stable Diffusion’s latent space \cite{rombach2022high}. A $256\!\times\!256$ grayscale image $\mathbf{x}$ is replicated to RGB and normalized:
\[
\mathbf{x}_{\text{norm}} = 2\,\mathrm{repeat}(\mathbf{x},3)-1.
\]

The pretrained VAE encodes
\[
\mathbf{z}=0.18215\,\mathcal{E}_{\text{VAE}}(\mathbf{x}_{\text{norm}})
\in\mathbb{R}^{4\times 32\times 32},
\]
a $16{\times}$ spatial reduction with fixed latent scaling. Both input and pseudo-label use the same mapping:
\[
\mathbf{z}_{\text{in}} = 0.18215\,\mathcal{E}_{\text{VAE}}(2\mathbf{x}-1),\qquad
\mathbf{z}_{\text{pseudo}} = 0.18215\,\mathcal{E}_{\text{VAE}}(2\mathbf{y}_{\text{pseudo}}-1).
\]

Decoding applies the inverse scale:
\[
\hat{\mathbf{x}}=\tfrac{1}{2}\!\left(\mathcal{D}_{\text{VAE}}\!\left(\mathbf{z}/0.18215\right)+1\right),
\qquad
\hat{\mathbf{x}}_{\text{gray}}=\tfrac{1}{3}\!\sum_{c=1}^{3}\hat{\mathbf{x}}_c.
\]

The VAE remains frozen, its pretrained latent manifold provides a compact, stable representation for guided diffusion without domain-specific finetuning.

\subsubsection{Gradient-Based Guidance Toward Pseudo-Labels}

The key innovation of our framework is applying gradient-based guidance at each diffusion step to constrain generation toward the pseudo-label, inspired by classifier guidance \cite{dhariwal2021diffusion} and underwater image enhancement diffusion priors \cite{du2025uiedp}, but adapted for medical imaging.

At each reverse diffusion step $t \in \mathcal{T}$, after predicting $\hat{\mathbf{z}}_0$, we compute guidance loss $\mathcal{L}_{\text{guide}} = \| \hat{\mathbf{z}}_0 - \mathbf{z}_{\text{pseudo}} \|^2$ and apply the gradient to obtain:
\begin{equation}
\hat{\mathbf{z}}_0^{\text{guided}} = \hat{\mathbf{z}}_0 - \lambda \nabla_{\hat{\mathbf{z}}_0} \mathcal{L}_{\text{guide}} = (1 - 2\lambda)\hat{z}_{0} + 2\lambda z_{pseudo}
\end{equation}
where $\lambda$ is the guidance scale. The DDIM update proceeds with this guided prediction:
\begin{equation}
\mathbf{z}_{t-1} = \sqrt{\bar{\alpha}_{t-1}} \hat{\mathbf{z}}_0^{\text{guided}} + \sqrt{1 - \bar{\alpha}_{t-1}} \boldsymbol{\epsilon}_\theta(\mathbf{z}_t, t)
\end{equation}

This mechanism acts as a \textbf{soft constraint} that pulls the diffusion model toward the pseudo-label while allowing refinement through learned priors. We use $\lambda = 1200$ for ultrasound and $\lambda = 1000$ for MRI, determined empirically. This soft constraint design also provides robustness to pseudo-label quality: 
$\lambda$ controls the trust placed in pseudo-labels, so if classical preprocessing 
produces imperfect results on severely degraded inputs, the diffusion model can 
rely more on its learned priors by reducing $\lambda$, preventing catastrophic 
guidance toward incorrect anatomical targets.
Algorithm \ref{alg:guided_diffusion} shows the steps of the framework.

\begin{algorithm2e}[!ht]
\caption{Gradient-Guided Diffusion Enhancement}
\label{alg:guided_diffusion}

\KwIn{Image $\mathbf{x}$, guidance scale $\lambda$, timesteps $\mathcal{T}$}
\KwOut{Enhanced image $\hat{\mathbf{x}}$}

$\mathbf{y}_{\text{pseudo}} \leftarrow \text{ClassicalEnhance}(\mathbf{x})$\;
$\mathbf{z}_{\text{pseudo}} \leftarrow 0.18215\,\mathcal{E}_{VAE}(2\mathbf{y}_{\text{pseudo}}-1)$\;
Sample $\mathbf{z}_T \sim \mathcal{N}(0,I)$\;

\For{$t$ in $\mathcal{T}$ (reverse)}{
    $\epsilon_t \leftarrow \epsilon_\theta(\mathbf{z}_t, t)$\;
    $\hat{\mathbf{z}}_0 \leftarrow (\mathbf{z}_t - \sqrt{1-\bar{\alpha}_t}\,\epsilon_t)/\sqrt{\bar{\alpha}_t}$\;
    $\hat{\mathbf{z}}_0 \leftarrow \hat{\mathbf{z}}_0 - 2\lambda(\hat{\mathbf{z}}_0 - \mathbf{z}_{\text{pseudo}})$\;
    \If{$t > 0$}{
        $\mathbf{z}_{t-1} \leftarrow 
        \sqrt{\bar{\alpha}_{t-1}}\hat{\mathbf{z}}_0 +
        \sqrt{1-\bar{\alpha}_{t-1}}\epsilon_t$\;
    }
}

$\hat{\mathbf{x}} \leftarrow \tfrac{1}{2}(\mathcal{D}_{VAE}(\mathbf{z}_0/0.18215)+1)$\;
$\hat{\mathbf{x}} \leftarrow \text{RGB2Gray}(\hat{\mathbf{x}})$\;

\Return $\hat{\mathbf{x}}$\;
\end{algorithm2e}


\subsection{Realistic Degradation Simulation}
For Ultrasound, we apply synthetic degradation to each image as \textbf{Speckle noise} which is a multiplicative noise modeling coherent interference,\\ 
$I_{\text{speckle}} = \text{clip}(I_0 \cdot (1 + \eta), 0, 1)$ where 
$\eta \sim \mathcal{N}(0, \sigma_n^2)$ with $\sigma_n \in [0.1, 0.3]$ sampled uniformly.\\
For MRI, we apply synthetic degradation to each image as \textbf{Rician noise} which is the magnitude reconstruction noise from complex MRI signals, \\
$I_{\text{Rician}} = \sqrt{(I_0 + \eta_{\text{real}})^2 + \eta_{\text{imag}}^2}$ 
where $\eta_{\text{real}}, \eta_{\text{imag}} \sim \mathcal{N}(0, \sigma_n^2)$ 
with $\sigma_n \in [0.02, 0.04]$ sampled uniformly. 

\subsection{Downstream Segmentation Evaluation}

While image quality metrics (PSNR, SSIM) provide quantification, the ultimate test is whether enhancement improves clinical task performance. For cardiac imaging, accurate segmentation of left ventricle, myocardium, and atrium is critical for diagnosis. We evaluate our method by measuring downstream segmentation accuracy across five popular architectures: \textbf{U-Net} \cite{ronneberger2015unetconvolutionalnetworksbiomedical}, \textbf{Attention U-Net} \cite{oktay2018attentionunetlearninglook},  \textbf{UNETR} \cite{hatamizadeh2022unetr}, \textbf{DeepLabV3+} \cite{chen2018encoderdecoderatrousseparableconvolution}, and \textbf{U-Net++} \cite{zhou2018unetnestedunetarchitecture}. This diversity ensures our findings generalize across different architectural paradigms (CNNs, attention mechanisms, transformers).

\section{Training Protocol}

For each architecture, we train the model on clean images to get a pretrained segmentation model. Then we test on three variants of the test data: \textbf{Baseline} (original images degraded through modality specific degradation $I_{\text{deg}}$), \textbf{Classical} (pseudo-labels $\mathbf{y}_{\text{pseudo}}$), and \textbf{Ours} (diffusion-enhanced $\hat{\mathbf{x}}_{\text{enhanced}}$). All models use identical hyperparameters: Adam optimizer ($\beta_1 = 0.9$, $\beta_2 = 0.999$), learning rate $10^{-4}$ with cosine annealing, batch size 16, 500 epochs with early stopping (patience 20), combined Dice and cross-entropy loss $\mathcal{L} = \mathcal{L}_{\text{Dice}} + \mathcal{L}_{\text{CE}}$, and standard augmentations (rotation $\pm 15°$, scaling $0.9$--$1.1$, flipping). This ensures fair comparison across image variants. We additionally evaluate a realistic scenario where segmentation models are trained on degraded images and tested on our enhanced outputs, to simulate clinical conditions where clean training data is unavailable.

\subsection{Evaluation Metrics}

We evaluate segmentation using Dice, IoU, Average Surface Distance (ASD), and the 95th percentile Hausdorff Distance (HD95). For quantifying enhancement we use the NIQE metric. Let $P_c$ and $G_c$ denote predicted and ground-truth masks for class $c$.

\paragraph{Dice:}
\begin{equation}
\text{Dice}_c = \frac{2|P_c \cap G_c|}{|P_c| + |G_c|}.
\end{equation}

\paragraph{IoU:}
\begin{equation}
\text{IoU}_c = \frac{|P_c \cap G_c|}{|P_c \cup G_c|}.
\end{equation}

\paragraph{ASD:}
\begin{equation}
\text{ASD}(P_c,G_c)
= \frac{1}{|S_P|+|S_G|}
\!\left(
\sum_{p\in S_P} d(p,S_G)
+
\sum_{g\in S_G} d(g,S_P)
\right),
\end{equation}
where $S_P,S_G$ are surface voxels.

\paragraph{HD95:}
\begin{equation}
\text{HD95}(P_c,G_c)
= \operatorname{percentile}_{95}\!\left(
\{d(p,S_G)\}_{p\in S_P}
\cup
\{d(g,S_P)\}_{g\in S_G}
\right).
\end{equation}

\paragraph{NIQE:}
\begin{equation}
\text{NIQE}(I) = \sqrt{(\mathbf{v} - \mathbf{v}_{\text{NSS}})^T \left(\frac{\Sigma_1 + \Sigma_2}{2}\right)^{-1} (\mathbf{v} - \mathbf{v}_{\text{NSS}})},
\end{equation}
where $\mathbf{v}$ are natural scene statistics features extracted from image $I$, $\mathbf{v}_{\text{NSS}}$ are features from the natural image database, and $\Sigma_1, \Sigma_2$ are their respective covariance matrices. Lower NIQE indicates better perceptual quality.

\subsection{Experimental Design}

\subsubsection{Datasets}

\paragraph{CAMUS (Ultrasound):}
The Cardiac Acquisitions for Multi-structure Ultrasound Segmentation dataset \cite{leclerc2019deep} contains 2D echocardiography from 500 patients with end-diastolic and end-systolic frames ($256 \times 256$ pixels). Expert annotations include left ventricle, myocardium, and left atrium. We use 450 patients for training, 50 for testing as provided in the dataset.

\paragraph{ACDC (MRI):}
The Automated Cardiac Diagnosis Challenge dataset \cite{bernard2018deep} contains short-axis cardiac cine-MRI from 150 patients across five pathology groups (Dilated cardiomyopathy, Hypertrophic cardiomyopathy, Myocardial infarction , Abnormal right ventricle ) plus healthy controls. Annotations include right ventricle, myocardium, and left ventricle ($256 \times 256$ pixels). We use 100 patients for training, 50 for testing as provided in the dataset.

We implement our framework in PyTorch 2.13 with HuggingFace Diffusers, MONAI, and OpenCV libraries. All experiments run on NVIDIA H200 GPUs (141GB). Inference time for reverse diffusion (10 steps) is approximately 5-10 seconds per image and is largely dependent upon pseudo label generation phase, which is acceptable for non-emergency clinical workflows.

\paragraph{Reproducibility:}Code is available at \url{https://github.com/pks716/MIDL_26}.

\section{Results}
\subsection{Segmentation Performance}

Tables~\ref{tab:segmentation_results} and~\ref{tab:acdc_segmentation_results} present comprehensive segmentation results. Our method consistently outperforms both degraded baselines and classical preprocessing across all architectures and modalities.

\begin{table}[!ht]
\centering
\caption{Segmentation performance comparison across architectures and enhancement methods on CAMUS ultrasound dataset. Metrics reported are averaged across cardiac structures (LV cavity, myocardium, LA cavity), excluding background. Bold indicates best performance per architecture. Variation in NIQE score is due to degradation sampling being random.}
\label{tab:segmentation_results}
\resizebox{\textwidth}{!}{
\begin{tabular}{l|ccc|ccc|ccc|ccc|ccc}
\toprule
\multirow{2}{*}{\textbf{Architecture}} & \multicolumn{3}{c|}{\textbf{Dice $\uparrow$}} & \multicolumn{3}{c|}{\textbf{IoU $\uparrow$}} & \multicolumn{3}{c|}{\textbf{HD95 (px) $\downarrow$}} & \multicolumn{3}{c|}{\textbf{ASD (px) $\downarrow$}} & \multicolumn{3}{c}{\textbf{NIQE $\downarrow$}} \\
\cmidrule(lr){2-4} \cmidrule(lr){5-7} \cmidrule(lr){8-10} \cmidrule(lr){11-13} \cmidrule(lr){14-16}
& Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} \\
\midrule
U-Net & 0.822 & 0.839 & \textbf{0.859} & 0.712 & 0.727 & \textbf{0.746} & 21.41 & 20.20 & \textbf{18.89} & 9.15 & 7.35 & \textbf{6.55} & 11.19 & 5.91 & \textbf{5.33} \\
& \scriptsize{$\pm$0.089} & \scriptsize{$\pm$0.067} & \scriptsize{\textbf{$\pm$0.054}} & \scriptsize{$\pm$0.104} & \scriptsize{$\pm$0.081} & \scriptsize{\textbf{$\pm$0.067}} & \scriptsize{$\pm$2.31} & \scriptsize{$\pm$1.87} & \scriptsize{\textbf{$\pm$1.52}} & \scriptsize{$\pm$0.67} & \scriptsize{$\pm$0.52} & \scriptsize{\textbf{$\pm$0.41}} & \scriptsize{$\pm$1.24} & \scriptsize{$\pm$1.03} & \scriptsize{\textbf{$\pm$0.97}} \\
\midrule
Attention U-Net & 0.831 & 0.844 & \textbf{0.862} & 0.710 & 0.723 & \textbf{0.741} & 20.33 & 20.19 & \textbf{19.18} & 8.81 & 7.22 & \textbf{5.31} & 11.22 & 5.63 & \textbf{5.35} \\
& \scriptsize{$\pm$0.086} & \scriptsize{$\pm$0.065} & \scriptsize{\textbf{$\pm$0.053}} & \scriptsize{$\pm$0.102} & \scriptsize{$\pm$0.079} & \scriptsize{\textbf{$\pm$0.066}} & \scriptsize{$\pm$2.25} & \scriptsize{$\pm$1.83} & \scriptsize{\textbf{$\pm$1.49}} & \scriptsize{$\pm$0.66} & \scriptsize{$\pm$0.51} & \scriptsize{\textbf{$\pm$0.40}} & \scriptsize{$\pm$1.22} & \scriptsize{$\pm$1.01} & \scriptsize{\textbf{$\pm$0.96}} \\
\midrule
UNETR & 0.830 & 0.839 & \textbf{0.854} & 0.721 & 0.725 & \textbf{0.744} & 20.35 & 19.58 & \textbf{19.10} & 8.55 & 7.35 & \textbf{5.65} & 10.35 & 5.75 & \textbf{5.61} \\
& \scriptsize{$\pm$0.087} & \scriptsize{$\pm$0.066} & \scriptsize{\textbf{$\pm$0.054}} & \scriptsize{$\pm$0.103} & \scriptsize{$\pm$0.080} & \scriptsize{\textbf{$\pm$0.067}} & \scriptsize{$\pm$2.27} & \scriptsize{$\pm$1.84} & \scriptsize{\textbf{$\pm$1.50}} & \scriptsize{$\pm$0.66} & \scriptsize{$\pm$0.51} & \scriptsize{\textbf{$\pm$0.40}} & \scriptsize{$\pm$1.21} & \scriptsize{$\pm$1.01} & \scriptsize{\textbf{$\pm$0.96}} \\
\midrule
DeepLabV3+ & 0.845 & 0.859 & \textbf{0.868} & 0.747 & 0.758 & \textbf{0.771} & 19.76 & 19.10 & \textbf{18.49} & 8.12 & 7.18 & \textbf{6.22} & 10.53 & 6.18 & \textbf{5.45} \\
& \scriptsize{$\pm$0.083} & \scriptsize{$\pm$0.062} & \scriptsize{\textbf{$\pm$0.050}} & \scriptsize{$\pm$0.099} & \scriptsize{$\pm$0.076} & \scriptsize{\textbf{$\pm$0.063}} & \scriptsize{$\pm$2.16} & \scriptsize{$\pm$1.73} & \scriptsize{\textbf{$\pm$1.38}} & \scriptsize{$\pm$0.63} & \scriptsize{$\pm$0.49} & \scriptsize{\textbf{$\pm$0.38}} & \scriptsize{$\pm$1.18} & \scriptsize{$\pm$0.98} & \scriptsize{\textbf{$\pm$0.93}} \\
\midrule
U-Net++ & 0.841 & 0.852 & \textbf{0.863} & 0.745 & 0.753 & \textbf{0.772} & 19.98 & 18.87 & \textbf{18.41} & 8.24 & 7.71 & \textbf{6.37} & 10.69 & 6.35 & \textbf{5.80} \\
& \scriptsize{$\pm$0.084} & \scriptsize{$\pm$0.063} & \scriptsize{\textbf{$\pm$0.051}} & \scriptsize{$\pm$0.100} & \scriptsize{$\pm$0.077} & \scriptsize{\textbf{$\pm$0.064}} & \scriptsize{$\pm$2.19} & \scriptsize{$\pm$1.76} & \scriptsize{\textbf{$\pm$1.41}} & \scriptsize{$\pm$0.64} & \scriptsize{$\pm$0.50} & \scriptsize{\textbf{$\pm$0.39}} & \scriptsize{$\pm$1.20} & \scriptsize{$\pm$0.99} & \scriptsize{\textbf{$\pm$0.94}} \\
\bottomrule
\end{tabular}
}
\end{table}

\begin{table}[!ht]
\centering
\caption{Segmentation performance comparison across architectures and enhancement methods on ACDC cardiac MRI dataset. Metrics reported are averaged across cardiac structures (RV cavity, myocardium, LV cavity), excluding background. Bold indicates best performance per architecture. Variation in NIQE score is due to degradation sampling being random.}
\label{tab:acdc_segmentation_results}
\resizebox{\textwidth}{!}{
\begin{tabular}{l|ccc|ccc|ccc|ccc|ccc}
\toprule
\multirow{2}{*}{\textbf{Architecture}} & \multicolumn{3}{c|}{\textbf{Dice $\uparrow$}} & \multicolumn{3}{c|}{\textbf{IoU $\uparrow$}} & \multicolumn{3}{c|}{\textbf{HD95 (px) $\downarrow$}} & \multicolumn{3}{c|}{\textbf{ASD (px) $\downarrow$}} & \multicolumn{3}{c}{\textbf{NIQE $\downarrow$}} \\
\cmidrule(lr){2-4} \cmidrule(lr){5-7} \cmidrule(lr){8-10} \cmidrule(lr){11-13} \cmidrule(lr){14-16}
& Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} & Degr. & Class. & \textbf{Ours} \\
\midrule
U-Net & 0.729 & 0.752 & \textbf{0.785} & 0.637 & 0.672 & \textbf{0.701} & 11.57 & 10.87 & \textbf{9.42} & 5.67 & 5.03 & \textbf{4.61} & 8.45 & 7.32 & \textbf{7.11} \\
& \scriptsize{$\pm$0.095} & \scriptsize{$\pm$0.071} & \scriptsize{\textbf{$\pm$0.058}} & \scriptsize{$\pm$0.112} & \scriptsize{$\pm$0.086} & \scriptsize{\textbf{$\pm$0.072}} & \scriptsize{$\pm$2.56} & \scriptsize{$\pm$1.98} & \scriptsize{\textbf{$\pm$1.67}} & \scriptsize{$\pm$0.73} & \scriptsize{$\pm$0.58} & \scriptsize{\textbf{$\pm$0.47}} & \scriptsize{$\pm$1.38} & \scriptsize{$\pm$1.15} & \scriptsize{\textbf{$\pm$1.06}} \\
\midrule
Attention U-Net & 0.802 & 0.829 & \textbf{0.847} & 0.723 & 0.741 & \textbf{0.760} & 11.16 & 10.45 & \textbf{9.28} & 5.11 & 4.89 & \textbf{4.50} & 8.23 & 7.89 & \textbf{6.74} \\
& \scriptsize{$\pm$0.088} & \scriptsize{$\pm$0.066} & \scriptsize{\textbf{$\pm$0.053}} & \scriptsize{$\pm$0.104} & \scriptsize{$\pm$0.080} & \scriptsize{\textbf{$\pm$0.066}} & \scriptsize{$\pm$2.38} & \scriptsize{$\pm$1.84} & \scriptsize{\textbf{$\pm$1.54}} & \scriptsize{$\pm$0.68} & \scriptsize{$\pm$0.54} & \scriptsize{\textbf{$\pm$0.44}} & \scriptsize{$\pm$1.31} & \scriptsize{$\pm$1.09} & \scriptsize{\textbf{$\pm$1.01}} \\
\midrule
UNETR & 0.741 & 0.754 & \textbf{0.768} & 0.649 & 0.698 & \textbf{0.717} & 10.31 & 10.02 & \textbf{9.83} & 5.89 & 5.21 & \textbf{4.82} & 8.78 & 7.45 & \textbf{6.92} \\
& \scriptsize{$\pm$0.101} & \scriptsize{$\pm$0.078} & \scriptsize{\textbf{$\pm$0.065}} & \scriptsize{$\pm$0.118} & \scriptsize{$\pm$0.093} & \scriptsize{\textbf{$\pm$0.079}} & \scriptsize{$\pm$2.78} & \scriptsize{$\pm$2.15} & \scriptsize{\textbf{$\pm$1.89}} & \scriptsize{$\pm$0.79} & \scriptsize{$\pm$0.63} & \scriptsize{\textbf{$\pm$0.54}} & \scriptsize{$\pm$1.45} & \scriptsize{$\pm$1.23} & \scriptsize{\textbf{$\pm$1.15}} \\
\midrule
DeepLabV3+ & 0.812 & 0.828 & \textbf{0.837} & 0.731 & 0.739 & \textbf{0.748} & 9.48 & 9.12 & \textbf{8.81} & 5.60 & 4.98 & \textbf{4.57} & 8.71 & 8.05 & \textbf{7.51} \\
& \scriptsize{$\pm$0.092} & \scriptsize{$\pm$0.069} & \scriptsize{\textbf{$\pm$0.056}} & \scriptsize{$\pm$0.108} & \scriptsize{$\pm$0.084} & \scriptsize{\textbf{$\pm$0.070}} & \scriptsize{$\pm$2.48} & \scriptsize{$\pm$1.92} & \scriptsize{\textbf{$\pm$1.63}} & \scriptsize{$\pm$0.71} & \scriptsize{$\pm$0.56} & \scriptsize{\textbf{$\pm$0.46}} & \scriptsize{$\pm$1.35} & \scriptsize{$\pm$1.12} & \scriptsize{\textbf{$\pm$1.04}} \\
\midrule
U-Net++ & 0.824 & 0.831 & \textbf{0.836} & 0.742 & 0.751 & \textbf{0.767} & 8.89 & 8.58 & \textbf{8.18} & 5.54 & 4.93 & \textbf{4.47} & 8.31 & 6.98 & \textbf{6.45} \\
& \scriptsize{$\pm$0.090} & \scriptsize{$\pm$0.068} & \scriptsize{\textbf{$\pm$0.055}} & \scriptsize{$\pm$0.106} & \scriptsize{$\pm$0.082} & \scriptsize{\textbf{$\pm$0.069}} & \scriptsize{$\pm$2.43} & \scriptsize{$\pm$1.87} & \scriptsize{\textbf{$\pm$1.59}} & \scriptsize{$\pm$0.69} & \scriptsize{$\pm$0.55} & \scriptsize{\textbf{$\pm$0.45}} & \scriptsize{$\pm$1.33} & \scriptsize{$\pm$1.11} & \scriptsize{\textbf{$\pm$1.03}}\\
\bottomrule
\end{tabular}
}
\end{table}

\subsection{Qualitative Results}

Figure~\ref{fig:grid2} shows representative segmentation examples. For ultrasound, our method removes speckle while preserving myocardial texture and recovers fine structures. For MRI, classical method corrects bias fields but struggles with severe Rician noise in low-SNR regions; our method refines these areas while preserving tissue intensity relationships.

\begin{figure}[!ht]
\floatconts
  {fig:grid2}
  {\caption{Representative samples from test datasets. 
CAMUS (columns 1-3): Cardiac ultrasound showing degraded, pseudo label, enhanced images and their respective segmentation results. 
ACDC (columns 4-6): Cardiac MRI showing degraded, pseudo label, enhanced images and their respective segmentation results.}}
  {\includegraphics[width=0.9\linewidth]{grid2.pdf}}
\end{figure}

\subsection{Ablation Study}

Table~\ref{tab:ablation} presents an ablation on CAMUS (U-Net) isolating the contribution of pseudo-label guidance. Unconstrained diffusion collapses without guidance (Dice 0.484, hallucinations), confirming that pseudo-labels are essential as differentiable guidance targets.

\begin{table}[!ht]
\centering
\caption{Ablation study on CAMUS dataset showing the necessity of pseudo-label guidance.}
\label{tab:ablation}
\begin{tabular}{lc}
\toprule
\textbf{Method (U-Net)} & \textbf{Dice} \\
\midrule
Classical pipeline alone & 0.839 \\
Diffusion without pseudo-label guidance & 0.484 \\
Ours (diffusion + pseudo-label guidance) & \textbf{0.859} \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Supervised Baseline Comparison}

We trained a supervised Diffusion model on CAMUS (450 patients, 100k steps, $\sim$12 GPU-hours on H200, AdamW lr=$10^{-5}$ with cosine decay, L2 reconstruction loss) as a direct comparison. Our training-free method achieves Dice $0.863 \pm 0.051$ vs.\ supervised diffusion $0.861 \pm 0.072$, matching supervised performance without any training cost or overfitting risk on limited medical data.

\subsection{Realistic Training Scenario}

Table~\ref{tab:realistic} reports results when the segmentation model is trained on degraded images (CAMUS, U-Net), simulating real-world clinical deployment where clean training data is unavailable. Enhancement benefits persist in this scenario, confirming genuine structural recovery.

\begin{table}[!ht]
\centering
\caption{Realistic training scenario: segmentation model trained on degraded images.}
\label{tab:realistic}
\begin{tabular}{llc}
\toprule
\textbf{Training Data} & \textbf{Test Data} & \textbf{Dice} \\
\midrule
Degraded & Degraded & 0.841 \\
Degraded & Ours (enhanced) & \textbf{0.862} \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Intensity Distribution Analysis}

Table~\ref{tab:intensity} confirms our method introduces minimal intensity shift relative to classical preprocessing on the CAMUS test set (values normalized to $[0,1]$), while improving boundary accuracy.

\begin{table}[!ht]
\centering
\caption{Intensity distribution analysis on CAMUS test set.}
\label{tab:intensity}
\begin{tabular}{lcc}
\toprule
\textbf{Method} & \textbf{Mean Intensity} & \textbf{Std (Variability)} \\
\midrule
Degraded & $0.190 \pm 0.036$ & $0.231 \pm 0.034$ \\
Classical (Pseudo-label) & $0.227 \pm 0.024$ & $0.247 \pm 0.020$ \\
Ours & $0.219 \pm 0.024$ & $0.239 \pm 0.020$ \\
\bottomrule
\end{tabular}
\end{table}

\section{Discussion}

\subsection{Key Findings}

Our hybrid framework demonstrates that pretrained diffusion models can enhance medical images when properly constrained by classical domain priors. Three key findings emerge: (1) gradient-guided sampling prevents hallucinations while enabling learned refinement; (2) the framework generalizes across modalities, anatomical structures, and deep learning architectures, without requiring domain-specific training; (3) boundary accuracy improvements (HD95 and ASD reduction) indicate enhanced fine structure preservation critical for clinical measurements.

\subsection{Limitations and Future Work}

The optimal guidance scale $\lambda$ is modality-dependent and determined empirically; 
future work should explore adaptive guidance schedules that adjust $\lambda$ per-image 
based on degradation severity or anatomical region, potentially improving robustness 
across diverse clinical scenarios. Current work processes 2D slices independently, which 
is intentional: CAMUS provides only 2D frames, and ACDC is evaluated slice-wise in the 
official challenge. This unified 2D design reduces memory requirements (4GB vs.\ 16--32GB 
for 3D diffusion models) and generalizes to inherently 2D modalities (X-ray, single-slice 
CT), though extending to 3D could leverage inter-slice consistency at the cost of 
substantially higher memory requirements. The current pipeline runs in 5--10 seconds per 
image (3--6s classical preprocessing, 2--4s diffusion sampling), targeting offline 
post-acquisition workflows such as ejection fraction analysis and clinical reporting rather 
than real-time scanning; distillation-based acceleration could enable faster deployment in 
latency-sensitive settings. Evaluation on synthetically degraded images using 
physics-derived noise models (multiplicative speckle for ultrasound, Rician for MRI) 
follows established practice due to the clinical infeasibility of acquiring paired 
clean/degraded data, and real-world validation with radiologist evaluation remains an 
important avenue for future work. Validation on additional modalities (CT, X-ray, 
microscopy) would further establish generalizability, with each modality potentially 
requiring custom classical pipelines while the core gradient guidance mechanism transfers.

\subsection{Conclusion}

We presented a training-free hybrid framework that bridges classical preprocessing with pretrained diffusion models for medical image enhancement. Gradient-based guidance toward modality-specific pseudo-labels ensures anatomical fidelity while enabling learned refinement, achieving consistent improvements across ultrasound and MRI datasets without domain-specific training. This work establishes that pretrained natural image models, when properly constrained by classical domain priors, can enhance medical images while preserving the interpretability essential for clinical adoption.

\midlacknowledgments{This work was supported by ICMR (Grant ID: FIW-2024-01-00000151),\\
Project No: IITM/ICMR/SS/537, IIT Mandi.\\
The authors gratefully acknowledge 
Dr.\ Aditya Nigam (IIT Mandi) for his supervision, research guidance, and 
provision of computational infrastructure. The authors also acknowledge 
Dr.\ Pankaj Gupta (PGIMER, Chandigarh) for his expert contributions on medical 
imaging modalities, clinical workflow considerations, and diagnostic requirements 
that informed the clinical relevance of this work.}

\clearpage  
\bibliography{midl26_391}

\end{document}