Pixel-Wise Ensembled Masked Autoencoder for Multispectral Pansharpening

Published: 01 Jan 2024, Last Modified: 05 Mar 2025IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Pansharpening requires the fusion of a low-spatial-resolution multispectral (LRMS) image and a panchromatic (PAN) image with rich spatial details to obtain a high-spatial-resolution multispectral (HRMS) image. Recently, deep learning (DL)-based models have been proposed to tackle this problem and have made considerable progress. However, most existing methods rely on the conventional observation model, which treats LRMS as a blurred and downsampled version of HRMS. This observation model may lead to unsatisfactory performance and limited generalization ability at full-resolution evaluation, resulting in severe spectral and spatial distortion, as we observed that while DL-based models show significant improvement over traditional models on reduced-resolution evaluation, their performances deteriorate significantly at full resolution. In this article, we rethink the observation model and present a novel perspective from HRMS to LRMS and propose a pixel-wise ensembled masked autoencoder (PEMAE) to restore HRMS. Specifically, we consider LRMS as the result of pixel-wise masking on HRMS. Thus, LRMS can be seen as a natural input of a masked autoencoder. By ensembling the reconstruction results of multiple masking patterns, PEMAE obtains HRMS with both spectral information of LRMS and spatial details of PAN. In addition, we employ a linear cross-attention mechanism to replace the regular self-attention to reduce the computation to linear time complexity. Extensive experiments demonstrate that PEMAE outperforms state-of-the-art (SOTA) methods in terms of quantitative and visual performance at both reduced- and full-resolution evaluations. The codes are available at https://github.com/yc-cui/PEMAE.
Loading