A Dual-Decoder-VAE-Based Latent Diffusion Model for PAN-Sharpening

Hyun-Ho Kim, Munchurl Kim

Published: 01 Jan 2025, Last Modified: 06 Nov 2025IEEE Geoscience and Remote Sensing LettersEveryoneRevisionsCC BY-SA 4.0

Abstract: High-resolution (HR) electro-optical (EO) satellites generally obtain multispectral (MS) images of a lower spatial resolution than their corresponding panchromatic (PAN) images owing to physical constraints. Despite these challenges, HR MS images remain critical in fields such as defense, industrial monitoring, and disaster response. Therefore, PAN-sharpening techniques have been widely studied. Recent progress in PAN-sharpening has been driven by deep learning (DL), with diffusion models (DMs) emerging as a promising direction. Recent diffusion-based PAN-sharpening methods in the pixel domain generate high-quality PAN-sharpened (PS) images, but they generally require 25–50 denoising steps, imposing high computational complexity. To mitigate these limitations, we attempted to utilize a latent DM (LDM) for the PAN-sharpening task. However, the conventional variational autoencoder (CVAE) in LDM cannot accurately reconstruct satellite EO images of high bit depths, relatively smaller dynamic ranges (DRs) within the bit depths, and multichannel characteristics. In this study, we first identify the limitations of CVAE and propose a dual-decoder-VAE (DDV), which is more suitable for satellite EO images. Furthermore, we introduce a DDV-based latent diffusion PAN-sharpening (DDV-LDP) model. DDV-LDP achieves 0.06 dB and 0.98 dB higher PSNR values than a state-of-the-art (SOTA) diffusion-based method (UKnowDif-T) on the KOMPSAT-3A and WorldView-III datasets, respectively, even with a 99.5% reduction in testing time.

External IDs:doi:10.1109/lgrs.2025.3615226