Abstract: Generalized zero-shot learning (GZSL) is a prominent approach for implementing zero-shot learning involving unseen and seen classes in the classification stage. Many existing GZSL methods in remote sensing images use word vectors for semantic exploration that inadequately describe unseen scene classes. This paper proposes a novel embedding approach (WDV-ZRS) that combines word2vec and data2vec embedding techniques to enhance the classification accuracy of unseen classes in remote sensing images. Word2vec generates a vector representation of a word based on its context usage, capturing semantic relationships between words. Data2vec, derived from self-supervised learning, generates a continuous and contextualized latent representation, leveraging the strengths of the standard transformer architecture. The proposed WDV-ZRS leverages the semantic features of word2vec and data2vec to construct a discriminative semantic space for characterizing remote sensing scene classes. Experimental results and analysis on three benchmark datasets for scene classification in remote sensing images demonstrate the effectiveness of WDV-ZRS, surpassing existing GZSL methods.
Loading