RSEFormer: A Residual Squeeze-Excitation-Based Transformer for Pixelwise Hyperspectral Image Classification

Yusen Liu, Hao Zhang, Fashuai Li, Fei Han, Yicheng Wang, Hao Pan, Boyu Liu, Guoliang Tang, Genghua Huang, Tingting He, Yuwei Chen

Published: 01 Jan 2025, Last Modified: 16 Jul 2025IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hyperspectral image (HSI) classification plays an essential role in remote sensing image processing. Deep learning methods, especially the transformer, has achieved great success in HSI classification. However, due to the limited existing labeled data of HSI, the relation between objects is irregular in such a small dataset. Merely using the long-range attention based on transformers for learning may lead to bias results. In addition, it is challenging for current attention-based methods to extract attention between high-dimensional spectra, which affects the performance of the classification model. To this end, we propose a network that combines local spectral attention and global spatial-spectral attention, the residual depthwise separable squeeze-and-extraction transformer for HSI classification. Our framework integrates 3-D depthwise separable convolution (DSC) squeeze-and–excitation module, residual block, and sharpened attention vision transformer (SA-ViT) to extract spatial-spectral features from HSI. Three-dimensional DSC squeeze-and–excitation extracts spatial-spectral features and learns the local spectral implicit attention. Residual connection is introduced to hamper gradient vanishment during the network training. For global modeling, SA-ViT employs diagonal masking to eliminate self-token bias and learnable temperature parameters to sharpen attention score. Experimental results demonstrate that our method outperforms other approaches on five HSI benchmark datasets, achieving state-of-the-art performance.