CAT: Center Attention Transformer With Stratified Spatial-Spectral Token for Hyperspectral Image Classification

Published: 01 Jan 2024, Last Modified: 07 Mar 2025IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Most hyperspectral image (HSI) classification methods rely on square patch sampling to incorporate spatial information, thereby facilitating the label prediction of the center pixel. However, square patch sampling introduces numerous heterogeneous pixels, which could distort the label prediction of the center pixel. Moreover, it generates fixed training patch sample for each center pixel, hampering the performance of transformer-based models requiring a large number of training data. To address the above problems, we proposed center attention transformer (CAT) with stratified spatial–spectral token generated by superpixel sampling for HSI classification. First, to mitigate the inference of heterogeneous pixels, we propose sampling from superpixel region (SSR) mechanism to generate purer image cubes than traditional square neighborhood. Second, to expand the training data for transformer, we propose multiple stratified random sampling (MSRS) mechanism, which generates ample training samples without introducing additional labels. Finally, to more effectively extract information from the sampled patch tokens, we propose spatial–spectral token generation mechanism and CAT structure with Gaussian positional embedding (GPE). This framework can extract long-range correlations of spectral information and pay more attention on the center pixel in spatial dimension. Experimental results on three HSI datasets demonstrate the performance of our proposed method CAT outperforms several state-of-the-art methods. The code of this work is available at https://github.com/fengjiaqi927/CAT-Center_Attention_Transformer.
Loading