Abstract: Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable
during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes
to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross
modal mapping between the visual feature space and the semantic space. However, the mapping model learned only
from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a prob
lem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel
model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the pro
posed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then uti
lizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.
Loading