Dual-Branch Convolution-Transformer Network With Spectral-Spatial Attention for Hyperspectral Image Classification
Abstract: Hyperspectral image (HSI) classification is a key task in the field of remote sensing, aiming to assign category labels to each pixel by leveraging the spectral and spatial information in HSIs. Recently, many deep learning (DL) methods, such as convolutional neural networks (CNNs) and transformers, have been applied to this task, achieving significant results. However, most existing patch-based DL methods often overlook the potential relationships between the central pixel and its surrounding pixels. Additionally, the unique spectral characteristics of HSIs, such as the high correlation between adjacent spectral bands and low dependence between distant bands, also require special attention. Based on this, we propose a novel dual-branch convolution–transformer network with spectral–spatial attention (CTSSA), which can effectively aggregate both local and global spectral–spatial features. Specifically, CTSSA comprises two core modules: the pyramid spectral attention module (PSAM) and the center transformer encoder (CenterTE). The former extracts highly discriminative spectral features through a hierarchical multiscale attention mechanism, capturing subtle differences between adjacent spectral bands. The latter improves the original transformer encoder (TE) by introducing a center-attention mechanism to model the global relationship between the central pixel and its surrounding pixels, thereby enhancing classification accuracy while reducing computational complexity. Experimental results on four public datasets (Salinas, Pavia University, Houston, and WHUHi-LongKou) demonstrate that, compared with nine other networks, CTSSA achieves satisfactory performance with fewer parameters and relatively high efficiency.
External IDs:dblp:journals/tgrs/LuZJLC25
Loading