Cross-Modal Compositional Learning for Multilabel Remote Sensing Image Classification

Jie Guo, Shuchang Jiao, Hao Sun, Bin Song, Yuhao Chi

Published: 01 Jan 2025, Last Modified: 17 Apr 2025IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multilabel remote sensing image classification (MLRSIC) can provide comprehensive object-level semantic descriptions of remote sensing images. However, most existing methods struggle to effectively integrate visual features of images and high-level semantic information of labels, which limits their ability to obtain fine-grained image features with rich semantic information for classification. To address these issues, we propose a novel cross-modal compositional learning (CMCL) model for MLRSIC, which fully utilizes label information to improve classification performance. CMCL introduces rich label semantic information into the image feature extraction process, which enhances the visual features of each class with the label semantic. Meanwhile, CMCL constructs a feature space shared by image features and label correlation features for classification. The multilabel image classification task is modeled as the feature distance measurement task, and the visual features of images and the semantic information of labels are mutually promoted by the method. Experimental results on UCM, AID, and DFC15 multilabel datasets show that the proposed CMCL outperforms existing state-of-the-art methods.