VITMST++: Efficient Hyperspectral Reconstruction Through Vision Transformer-Based Spatial Compression

Ana C. Caznok Silveira, Diedre S. do Carmo, Lucas H. Ueda, Denis G. Fantinato, Paula D. P. Costa, Leticia Rittner

Published: 01 Jan 2025, Last Modified: 04 Dec 2025IEEE Open Journal of Signal ProcessingEveryoneRevisionsCC BY-SA 4.0

Abstract: Hyperspectralchannel reconstruction transforms a subsampled multispectral image into hyperspectral imaging, providing higher spectral resolution without a dedicated acquisition hardware and camera. Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (MST++) is a state-of-the-art channel reconstruction technique, but it faces memory limitations for high spatial-resolution images. In this context, we introduced VITMST++, a novel architecture incorporating Vision Transformer embeddings for spatial compression, multi-resolution image context, and a custom channel-weighted loss. Developed for the ICASSP 2024 HyperSkin Challenge, VITMST++ outperforms the state-of-the-art MST++ in both performance and computational efficiency in channel reconstruction. In this work, we perform a deeper analysis on the main aspects of VITMST++ efficiency, quantitative performance, and generalization to other datasets. Results show that VITMST++ achieves similar values of SAM and SSIM hyperspectral reconstruction metrics when compared to state-of-the-art methods, while consuming up to three fold less memory and needing up to 10 times fewer multiply-add operations.

External IDs:doi:10.1109/ojsp.2025.3544891