Abstract: With the advancement of remote sensing (RS) technology and satellite observation, the task of fusing multisource data, such as multispectral (MS) and panchromatic (PAN) images, has become increasingly important. However, image fusion involving certain semantic differences can hinder the model’s ability to learn effective feature mappings. To reconstruct richer and more consistent features during fusion, we propose an adaptive dual-supervised cross-deep dependency network (ADCD-Net), which consists of two training stages. Stage I uses a semantic perceptual self-supervision strategy (SPS) to learn deep features across different modalities, thereby reducing semantic differences while mining its own non-singular features. Stage II uses the deep temporal Mamba module (DTM-Module) to interactively learn the output of each network layers, which are able to take part in the deep feature reinforcement and improve the classification performance of semantic information. Finally, to eliminate channel redundancy during the two-stage network training process while enhancing spatial location memory and feature discrimination in the 2-D features, we propose a deformable interactive attention module (DIA-Module) to further bolster feature representation capabilities. Additionally, we conduct comparative and transfer experiments on multiple RS datasets, achieving outstanding classification results. Our code is available at https://github.com/ChenC1027/ADCD-Net.
Loading