Robust Land Cover Classification With Multimodal Knowledge Distillation

Guozheng Xu, Xue Jiang, Yue Zhou, Shutao Li, Xingzhao Liu, Peiwen Lin

Published: 01 Jan 2024, Last Modified: 29 Sept 2024IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, enormous studies have been conducted to improve the land cover (LC) classification performance of multimodal remote sensing (RS) data, which outperforms single-modal-based methods by a large margin due to information diversity. To go a step further, we develop a two-branch patch-based convolutional neural network (CNN) with an encoder–decoder (ED) module to fuse multimodal RS data information. A knowledge distillation in model (DIM) module is proposed to guild per-modality encoder learning with the final fused information to enable multimodal data fusion more effectively. Moreover, utilizing multimodal information to guide single-modal learning still remains to be explored. To this end, a knowledge distillation cross-model (DCM) module is designed to improve single-modal LC classification with multimodal knowledge distillation, which bridges the gap between single-modal-based and multimodal-based methods. In particular, the multimodal-based method is taken as a teacher to transfer knowledge to single-modal-based methods. Extensive experiments are carried out on two multimodal RS datasets, including hyperspectral (HS) and light detection and ranging (LiDAR) data, i.e., the Houston2013 dataset, and HS and synthetic aperture radar (SAR) data, i.e., the Berlin dataset. The results demonstrate the effectiveness and superiority of the proposed multimodal fusion strategy in comparison with several state-of-the-art multimodal RS data classification methods. Also, the proposed DCM module improves the LC classification performance of single-modal methods by a large margin.