Interpretable Matching of Optical-SAR Image via Dynamically Conditioned Diffusion Models

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Driven by the complementary information fusion of optical and synthetic aperture radar (SAR) images, the optical-SAR image matching has drawn much attention. However, the significant radiometric differences between them imposes great challenges on accurate matching. Most existing approaches convert SAR and optical images into a shared feature space to perform the matching, but these methods often fail to achieve the robust matching since the feature spaces are unknown and uninterpretable. Motivated by the interpretable latent space of diffusion models, this paper formulates an optical-SAR image translation and matching framework via a dynamically conditioned diffusion model (DCDM) to achieve the interpretable and robust optical-SAR cross-modal image matching. Specifically, in the denoising process, to filter out outlier matching regions, a gated dynamic sparse cross-attention module is proposed to facilitate efficient and effective long-range interactions of multi-grained features between the cross-modal data. In addition, a spatial position consistency constraint is designed to promote the cross-attention features to perceive the spatial corresponding relation in different modalities, improving the matching precision. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of both the matching accuracy and the interpretability.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: The cross-modal image matching techniques are the key and core of the multi-modal information fusion in the remote sensing field. To address the issues of tof robustness and interpretability of multi-modal image matching, this study develops a dynamic conditioned diffusion model approach to the optical and synthetic aperture radar (SAR) remote sensing image matching. The proposed method provides a novel idea of interpretable and robust matching for the multi-modal remote sensing matching task,and it is expected to promote the development of interpretable learning-based matching methods.
Supplementary Material: zip
Submission Number: 5695
Loading