Multimodal Diffusion Bridge With Attention-Based SAR Fusion for Satellite Image Cloud Removal

Yuyang Hu, Suhas Lohit, Ulugbek S. Kamilov, Tim K. Marks

Published: 01 Jan 2025, Last Modified: 04 Nov 2025IEEE Transactions on Geoscience and Remote SensingEveryoneRevisionsCC BY-SA 4.0

Abstract: Deep learning has achieved some success in addressing the challenge of cloud removal (CR) in optical satellite images, by fusing with synthetic aperture radar (SAR) images. Recently, diffusion models (DMs) have emerged as powerful tools for CR, delivering higher quality estimation by sampling from cloud-free distributions, compared to earlier methods. However, DMs suffer from limitations that can result in suboptimal performance. In particular, DMs initiate sampling from pure Gaussian noise, which complicates the sampling trajectory. Moreover, current methods often inadequately fuse SAR and optical data; simple concatenation of these disparate modalities at the input stage typically yields suboptimal results. To address these limitations, we propose diffusion bridges for cloud removal (DB-CR), which directly bridges between the cloudy and cloud-free image distributions. In addition, we propose a novel multimodal DB architecture with a two-branch backbone for multimodal image restoration, incorporating an efficient backbone and dedicated cross-modality fusion blocks to effectively extract and fuse features from SAR and optical images. By formulating CR as a diffusion-bridge problem and leveraging this tailored architecture, DB-CR achieves high-fidelity results while being computationally efficient. We evaluated DB-CR on the SEN12MS-CR CR dataset, demonstrating that it achieves state-of-the-art results.

External IDs:doi:10.1109/tgrs.2025.3604654