Hyperspectral Image Classification Using Spectral-Spatial Token Enhanced Transformer With Hash-Based Positional Embedding

Abstract: Hyperspectral image (HSI) classification aims to distinguish the category of a land coverage object for each pixel. In an effective way, the transformer architecture has been successfully introduced for the HSI classification task with promising performance. However, existing transformer-based HSI classification methods still suffer from the inability to fully explore both spectral information and spatial information in HSIs. To this end, we propose a spectral–spatial token enhanced transformer (SSTE-Former) method with the hash-based positional embedding, which is the first to exploit multiscale spectral–spatial information for transformer-based HSI classification in-depth. Specifically, SSTE-Former accepts multiscale HSI cubes centered on the target pixel, which are preprocessed by the principal component analysis (PCA). Then, a designed multiscale convolutional neural network (CNN) architecture is utilized to extract short-range spectral–spatial features and generate token embeddings. In parallel, a novel hash-based spatially enhanced positional embedding tailored for HSI cubes is developed to model the correlations within and across multiscale token embeddings. Finally, multiscale token embeddings and hash-based positional embeddings are concatenated and flattened into the transformer encoder for long-range spectral–spatial feature fusion. We conduct extensive experiments on four benchmark HSI datasets and achieve superior performance compared with the state-of-the-art HSI classification methods.
0 Replies
Loading