Abstract: Hyperspectral image (HSI) classification plays a major role in remote sensing. The combination of convolutional neural networks (CNNs) and transformers has garnered increasing attention in HSI classification, yielding impressive performance. However, these approaches primarily focus on modeling local–global feature representations, often ignoring the rich multirange information inherent in HSIs. To address these issues, this work proposes a mid-range convolutional modulated transformer network (MCMTN) for HSI classification. Compared to conventional CNN and transformer architectures, the proposed method employs large-kernel convolutions to simplify self-attention (SA) operations and facilitate the integration of short-, mid-, and long-range information in HSIs. Specifically, we employ a lightweight spatial–spectral module (LWSSM) to extract the short-range spatial features and spectral features of HSIs. Furthermore, we can effectively model the long-range spatial information in a spatial compression module (SCM) using a simple compress-decompress strategy. We also introduce a mid-range convolutional modulation (MCM) into the network, which effectively captures mid-range spatial dependencies in HSIs while enabling semantic information interaction in a transformer manner. Compared with traditional HSI classification methods, our proposed MCMTN method can effectively extract multidimensional representations with less time consumption. The experimental results on several benchmark datasets show that the proposed MCMTN method achieves better classification performance than several state-of-the-art models. The code is available at https://github.com/szq0816/MCMTN
External IDs:dblp:journals/tgrs/WangTLSXY25
Loading