Enhancing Generative Generalized Zero Shot Learning via Multi-Space Constraints and Adaptive Integration

Published: 01 Jan 2024, Last Modified: 19 Feb 2025MMM (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Generalized zero shot learning (GZSL) aims to recognize both seen classes and unseen classes without labeled data. Generative GZSL models strive to synthesize features of unseen classes given semantic embeddings as input. The key lies in retaining the semantic consistency and discriminative ability of generated features to approach the real feature distribution. In this paper, we tackle these challenges by introducing additional spaces apart from original image feature space to constrain the generation process and propose a GAN-based model called f-CLSWGAN-VAE2. Specifically, we incorporate cross-modal aligned variational auto encoders (VAE2) in our model to align generated features and corresponding semantic embeddings in semantic alignment space encouraging the generator to synthesize semantically-consistent features. Besides, we resort to contrastive learning in another image embedding space to alleviate confusion between generated instances. Moreover, we pioneer to propose a novel adaptive integration strategy to adaptively weight the classification results from multiple spaces by means of a binary classifier trained in semantic alignment space. Experiments on four popular GZSL datasets indicate that our model significantly outperformed baseline and achieved comparable results with other methods.
Loading