Visual-Semantic Alignment for Few-shot Remote Sensing Scene Classification

Published: 01 Jan 2024, Last Modified: 06 Mar 2025ICMLC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a few-shot learning approach that aligns visual and semantic features in an embedding feature space to alleviate the shortage of training (or reference) data in remote sensing scene classification (RSSC). Specifically, the self-supervised learning is first employed to improve the expressive ability of the learned feature, which could effectively enhance the features’ generalizability. Meanwhile, we align the image feature and its corresponding class-semantic feature, which is obtained by feeding the class name to a language model such as BERT, to increase the image feature’s discriminability. By systematically integrating the self-supervised learning and visual-semantic alignment with the backbone network, our approach could achieve image features with good generalizability and discriminability. Experiments on UCMerced LandUse, NWPU-RESISC45, and AID benchmarks validate the feasibility of our approach and verify its improved few-shot classification performance in RSSC.
Loading