SuperCAT: Super Resolution and Cross Semantic Attribute-guided Transformer based Feature Refinement for Zero-Shot Remote Sensing Scene Classification

Rambabu Damalla; Gayathri C; Rajeshreddy Datla

SuperCAT: Super Resolution and Cross Semantic Attribute-guided Transformer based Feature Refinement for Zero-Shot Remote Sensing Scene Classification

Rambabu Damalla, Gayathri C, Rajeshreddy Datla

26 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scene classification, remote sensing images, zero-shot learning, Transformer

Abstract: Zero-shot learning becomes challenging in classifying scenes of unseen classes due to the typical characteristics of remote-sensing images. The intricate variations and non-uniform spatial resolutions among the scenes of remote sensing images further complicate achieving discriminative semantic knowledge. To tackle these issues, we propose a SuperCAT framework comprising a super-resolution module, a cross-semantic attribute-guided Transformer (CAT), feature-generating models, and a feature refinement (FR) module for the zero-shot scene classification in remote sensing images. First, we leverage the semantic attributes for all the classes of four benchmark remote sensing scene classification datasets to explore semantic knowledge using super-resolution effectively. Then, the semantic attribute to visual Transformer (SAVT) and visual to semantic attribute Transformer (VSAT) modules in CAT learn to obtain attribute-based visual features and visual-based attribute features, respectively. The SAVT and VSAT modules collaboratively learn and teach each other using the feature-level and prediction-level semantic collaborative losses. The feature-generating models map semantic vectors to the visual features of remote-sensing images. The FR module incorporates triplet center margin loss and semantic loop consistency loss functions to capture class-related and semantically-related discriminative features for achieving intra-class closeness and inter-class distinctiveness. Our extensive experiments on four benchmark remote sensing image scene classification datasets demonstrate the efficacy of SuperCAT over state-of-the-art approaches. The code can be accessed at https://github.com/ZSL-RSI-SC/SuperCAT.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6678

Loading