Abstract: Recently, the Transformer has achieved significant success in the hyperspectral image (HSI) classification task. However, most Transformers and their variants focus more on spatial-domain global feature learning, ignoring the complementary characteristics provided by frequency-domain features. The fast Fourier transform (FFT), due to its sensitivity to frequency-domain information, has become a primary tool for frequency-domain analysis. However, different frequency bands are often assigned the same attention values, and the differences between different frequency bands are not considered. To fully explore and fusion spatial- and frequency-domain features, we propose a multiscale spatial–frequency-domain cross-Transformer (SFDCT-Former) network. We design a two-branch structure for spatial-domain and frequency-domain feature learning: one branch utilizes the multihead self-attention (MHSA) module for spatial-domain feature learning, while the other incorporates a multifrequency-domain Transformer (MFre-Former) encoder for frequency-domain feature learning. The MFre-Former encoder divides the frequency domain into nonoverlapping frequency bands and assigns distinct attention to each frequency band, therefore, different frequency-domain information can be captured more precisely. Furthermore, to fuse the spatial- and frequency-domain features, we design a multilevel cross-attention (MLCA) fusion module. The MLCA module effectively combines spatial- and frequency-domain features at different levels to better capture their complementary characteristics. Extensive experiments conducted on four publicly available HSI datasets demonstrate that the proposed method outperforms nine state-of-the-art methods in classification performance. The code is available at https://github.com/AAAA-CS/SFDCT-Former
External IDs:dblp:journals/tim/ShiCFZHM25
Loading