DAE-GSP: Discriminative Autoencoder With Gaussian Selective Patch for Multimodal Remote Sensing Image Classification

Published: 01 Jan 2025, Last Modified: 25 Jan 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the field of multimodal remote sensing image (MRSI) classification, self-supervised learning (SSL) algorithms have demonstrated significant advantages, particularly in scenarios with limited labeled samples. Existing SSL methods typically use auxiliary tasks within either contrastive or generative frameworks, focusing on discriminative or structural information separately. In this article, we propose a novel hybrid SSL paradigm, discriminative autoencoder with Gaussian selective patch (DAE-GSP) for MRSI classification. The DAE framework integrates contrastive learning with the masked image modeling (MIM) technique, allowing for simultaneous learning of structural information and discriminative representations from images. Furthermore, a cross-attention-based data-level fusion strategy is introduced during pretraining stage to enhance intermodal interactions, thereby improving the effectiveness of modality fusion. In addition, we propose a novel Gaussian selective patch (GSP) strategy, addressing the limitations of traditional square patch selection methods. Combined with self-supervised auxiliary tasks, this strategy facilitates the improved integration of multiple modalities and encourages the model to capture essential semantic information. Extensive experiments conducted on three public datasets (Houston2013, Augsburg, and Berlin) demonstrate the effectiveness of the proposed approach. With only ten labeled training samples per class, the proposed method achieves overall accuracy (OA) of 90.15%, 82.64%, and 71.03% on the Houston2013, Augsburg, and Berlin datasets, respectively, indicating improvements of 1.31%, 1.22%, and 1.48% over state-of-the-art methods.
Loading