A Variational Autoencoder with Deep Embedding Model for Generalized Zero-Shot Learning

Peirong Ma, Xiao Hu

Published: 02 Apr 2020, Last Modified: 25 Mar 2025Proceedings of the AAAI conference on artificial intelligenceEveryoneRevisionsCC BY 4.0

Abstract: Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a prob lem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the pro posed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then uti lizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.