HyperEAST: An Enhanced Attention-Based Spectral-Spatial Transformer With Self-Supervised Pretraining for Hyperspectral Image Classification

Published: 19 Aug 2025, Last Modified: 07 May 2026IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 18)EveryoneRevisionsCC BY 4.0
Abstract: Hyperspectral images (HSIs) are essential in geoscientific applications, such as resource exploration, precision agriculture, and environmental monitoring, due to their rich spectral–spatial information. However, existing classification methods face notable limitations: Principal component analysis ignores spatial context, convolutional neural networks lack long-range modeling, and vision transformer (ViT)-based models often overfit under label-scarce conditions due to their high capacity and modality-agnostic design. To address these challenges, we propose HyperEAST, an efficient dual-branch ViT framework that explicitly decouples spectral and spatial feature modeling. At its core is a novel linear fusion attention mechanism, which replaces dot-product attention with a softmax-free additive formulation based on lightweight convolutions, enabling local–global representation learning with linear complexity. To enhance robustness under limited labels, we adopt a modality-aware masked image modeling strategy that separately reconstructs masked spectral and spatial tokens during self-supervised pretraining. We further introduce a dataset-aware hybrid loss combining cross-entropy and focal loss to mitigate class imbalance and sharpen decision boundaries. Experiments on four benchmark HSI datasets—WHU-Hi-HC, WHU-Hi-LK, Indian Pines, and Pavia University—demonstrate that HyperEAST achieves competitive accuracy, efficiency, and robustness.
Loading