Abstract: Pansharpening is the critical technology for generating high-resolution (HR) multispectral (MS) images by learning the cross-modality complementary representations between the panchromatic (PAN) images and low-resolution (LR) MS images. Though methods based on convolutional neural networks (CNNs) have dominated the pansharpening community, they still suffer from the limited global modeling capability due to the inherent property of the convolutional operator. To remedy this common limitation, the transformer family has recently gained great popularity in this field. However, existing cascaded transformer designs inevitably introduce a heavy memory footprint and computational cost due to the dense dot-product self-attention (SA) computation. More importantly, these paradigms simply ignore the innate sparsity of remote sensing images, leading to information redundancy and a challenging optimization process. To alleviate these issues, we propose the bilateral adaptive evolution transformer (BAEFormer), which is built upon two core mechanisms: bilateral attention computation and adaptive attention evolution. Specifically, we first decompose the conventional quadratic complexity SA into linear-degree height and width computing at the first stage, respectively, which significantly reduces the computational complexity. Given the data-specific properties, furthermore, we devise a novel yet effective neighboring layer-dependent strategy to adaptively update the attention map of two spatial dimensions, thereby avoiding the repetitive SA computation while taking into account the dynamics toward the evolution of attention weights. Our model, called BAEFormer, outperforms other state-of-the-art pansharpening methods on various remote sensing datasets while showing fewer network parameters and computational requirements. The code is available at https://github.com/coder-JMHou/BAEFormer.
Loading