Diff-HRNet: A Diffusion Model-Based High-Resolution Network for Remote Sensing Semantic Segmentation
Abstract: The semantic segmentation methods based on deep neural networks predominantly employ supervised learning, relying heavily on the quantity and quality of annotated samples. Due to the complexity of high-resolution remote sensing imagery, obtaining sufficient and precise pixel-level labeled data is highly challenging. This letter introduces a novel self-supervised learning method using a pretrained denoising diffusion probabilistic model (DDPM) to leverage semantic information from large-scale unlabeled remote sensing imageries. Building on this, a multistage fusion scheme between pretrained features and high-resolution features is proposed, enabling the network to learn more effective strategies to leverage prior information provided by the pretrained model while preserving the rich semantic details of high-resolution images. Experimental results on two remote sensing semantic segmentation datasets show that the proposed Diff-HRNet outperforms all compared methods, demonstrating the potential of pretrained diffusion models in extracting crucial feature representations for semantic segmentation tasks.
External IDs:dblp:journals/lgrs/WuLSPLC25
Loading