Layer Optimized Spatial Spectral Masked Autoencoder for Semantic Segmentation of Hyperspectral Imagery

Aaron Perez, Saurabh Prasad

Published: 01 Jan 2025, Last Modified: 12 Sept 2025WACV (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hyperspectral imaging (HSI) captures detailed spectral data across numerous contiguous bands, offering critical insights for applications such as environmental monitoring, agriculture, and urban planning. However, the high dimensionality of HSI data poses significant challenges for tra-ditional deep learning models, necessitating more efficient solutions. In this paper, we propose the Layer-Optimized Spatial-Spectral Transformer (LO-SST), a refined version of the Spatial-Spectral Transformer (SST) that incorporates structured layer pruning to reduce computational complexity while maintaining robust performance. LO-SST lever-ages self-supervised pretraining with a Masked Autoen-coder (MAE) framework, enabling the model to effectively learn spatial and spectral dependencies even in scenarios with limited labeled data. The use of separate spa-tial and spectral positional embeddings further enhances the model's ability to capture intricate relationships within hyperspectral data. Our experiments show that LO-SST achieves competitive segmentation accuracy while signifi-cantly reducing computational demands compared to traditional models. The effectiveness of random masking over alternative strategies during pretraining is also demonstrated, underscoring its ability to preserve critical image features. These results highlight the potential of LO-SST as an efficient and scalable solution for hyperspectral image segmen-tation, particularly in resource-constrained applications.