Coupling Local-Nonlocal Feature Representation for SAR and Multispectral Image Fusion

Published: 01 Jan 2024, Last Modified: 25 Jul 2025IEEE Geosci. Remote. Sens. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this letter, we propose the CLN-Fusion, a novel hybrid fusion approach that leverages the merits of convolutional neural networks (CNNs) and vision transformers (ViTs) to couple local-nonlocal feature representations between synthetic aperture radar (SAR) and multispectral (MS) images. Specifically, we construct a paired token projection (PTP) to match the observation scenario content consistency of the two. Meanwhile, in terms of merging the complementary features between structures in SAR images and textures in MS images, we establish the pyramid CNN and ViT branches that assemble two pure feature volumes with convolutional inductive biases and nonlocal statistical correlation, respectively, into a mixed one. Furthermore, our CLN-Fusion maintains semantic alignment by maximizing mutual information throughout the PTP. Extensive experiments validate the superiority of the CLN-Fusion in terms of quantitative metrics, achieving SAR/MS image fusion under three scenarios from Sentinel-1 and Landsat8 data. With peak signal-noise ratios (PSNRs) of 33.1565, 30.9815, and 29.9821, showcasing the utmost fusion performance in contrast to other state-of-the-art (SOTA) methods. The codes of this work will be available at https://github.com/Blueseatear/CLN-Fusion.
Loading