Abstract: Remote sensing (RS) scene classification (RSSC) is a prominent research topic in the RS community. Multilevel feature fusion is an important way of addressing RSSC, and many methods have been proposed in recent years. Although they succeed, current methods can still be improved, particularly in distinguishing the contributions of different multilevel features and fully and effectively fusing them. To address the above issues and fully exploit the potential of multilevel features for RSSC tasks, we propose a new model named multiscale sparse cross-attention network (MSCN). It not only focuses on the effectiveness of feature learning but also emphasizes the rationality of feature fusion. In detail, MSCN first extracts multilevel features using a pretrained ResNet50. Also, these features are divided into high- and low-level features according to the clues they are involved. Then, a multiscale sparse cross-attention (MSC) module is developed to cross-fuse the high-level feature with various low-level features, thereby effectively mining helpful information from multilevel features. In the fusion process, MSC not only explores the multiscale messages in RS scenes but also mitigates the negative impact of irrelevant information by employing sparse operations. Third, a group convolutional block attention module (CBAM) enhancer (GCE) is presented to enhance the representation of classification features. GCE detects local salient information within classification features using grouped CBAM and further enhances crucial details by readjusting the CBAM attention weights. This way, the classification features’ discrimination can be improved. We conducted extensive experiments on three public RSSC datasets. The exceptional experimental results indicate that our proposed MSCN achieves superior classification accuracy, surpassing many existing methods. Our source codes are available at https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/MSCN.
Loading