Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion

Shangqi Deng

Published: 18 Feb 2023, Last Modified: 06 Oct 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Transformer-based methods have proven to be ef-1 fective in achieving long-distance modeling, cap-2 turing the spatial and spectral information, and 3 exhibiting strong inductive bias in various com-4 puter vision tasks. Generally, the Transformer 5 model includes two common modes of multi-head 6 self-attention (MSA): spatial MSA (Spa-MSA) and 7 spectral MSA (Spe-MSA). However, Spa-MSA 8 is computationally efficient but limits the global 9 spatial response within a local window. On the 10 other hand, Spe-MSA can calculate channel self-11 attention to accommodate high-resolution images, 12 but it disregards the crucial local information that 13 is essential for low-level vision tasks. In this study, 14 we propose a bidirectional dilation Transformer 15 (BDT) for multispectral and hyperspectral image 16 fusion (MHIF), which aims to leverage the advan-17 tages of both MSA and the latent multiscale infor-18 mation specific to MHIF tasks. The BDT consists 19 of two designed modules: the dilation Spa-MSA 20 (D-Spa), which dynamically expands the spatial re-21 ceptive field through a given hollow strategy, and 22 the grouped Spe-MSA (G-Spe), which extracts la-23 tent features within the feature map and learns lo-24 cal data behavior. Additionally, to fully exploit 25 the multiscale information from both inputs with 26 different spatial resolutions, we employ a bidirec-27 tional hierarchy strategy in the BDT, resulting in 28 improved performance. Finally, extensive experi-29 ments on two commonly used datasets, CAVE and 30 Harvard, demonstrate the superiority of BDT both 31 visually and quantitatively.