GRetNet: Gaussian Retentive Network for Hyperspectral Image Classification

Published: 01 Jan 2024, Last Modified: 07 Nov 2024IEEE Geosci. Remote. Sens. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Vision transformer (ViT) is a prevalent technique for capturing long-distance dependencies and has shown impressive performance in the field of hyperspectral image (HSI) classification. However, the core component of ViT, namely, self-attention, faces challenges in balancing high-computational complexity and global modeling within entire input sequences. To alleviate this issue, a novel Gaussian retentive network, called GRetNet, is devised in this letter to enhance the comprehension of fine-grained spatial and spectral features while reducing computational costs. This method provides a powerful classification backbone and can adaptively generate priors to perceive more effective spatial information by introducing a spatial decay mask to assign different weights at various positions. Furthermore, the Gaussian multi-head attention (GMA) is designed to provide dynamic recalibration of feature significance based on statistical distribution and focuses on distinct spectral patterns across different heads, thereby rendering a more concise and robust modeling for HSI classification. Compared with the state-of-the-art classification algorithms, the proposed GRetNet method can yield better classification results and computational efficiency on four benchmark hyperspectral datasets, which verifies its effectiveness and superiority.
Loading