RSZero-CSAT: Zero-Shot Scene Classification in Remote Sensing Imagery using a Cross Semantic Attribute-guided Transformer

Published: 01 Jan 2024, Last Modified: 01 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Zero-shot learning (ZSL) based scene classification aims to recognize unseen classes by transferring semantic information from seen classes. The applicability of ZSL for scene classification in remote sensing images becomes challenging due to the complexity of scenes. Earlier attention-based methods are ineffective for extracting discriminative region-based features within a single image. This limitation hinders their ability to achieve transferability and accurately localize object attributes, which is essential for extracting discriminative region-based features. Hence, we propose a method for zero-shot scene classification in remote sensing images using a cross-semantic attribute-guided Transformer named RSZero-CSAT. Firstly, the semantic information is acquired using shared remote sensing semantic attributes to localize object attributes that characterize discriminative region features. Then, a Transformer in RSZero-CSAT is employed to localize object attributes within visual features accurately, enhancing the effectiveness of semantic information transfer in ZSL. Specifically, the RSZero-CSAT employs a semantic attribute → visual Transformer (SAVT) and a visual → semantic attribute Transformer (VSAT) components to extract visual features guided by semantic attributes and semantic attribute features guided by visual features, respectively. Further, SAVT and VSAT mutually learn and collaborate to obtain semantically enriched visual representations, leveraging prediction-level and feature-level semantic collaborative losses for capturing crucial semantic information. Finally, the semantically enriched visual representations obtained from SAVT and VSAT are combined to facilitate visual-semantic interactions in collaboration with class semantic vectors to classify ZSL. Our experimental results demonstrate the impact of RSZero-CSAT in improving the performance of unseen classes on four scene classification benchmark datasets in remote sensing images. The code is available at https://github.com/rs-scn-cls/rszero-csat.
Loading