- Keywords: generalized zero-shot learning, zero-shot learning, few-shot learning, image classification
- TL;DR: We use VAEs to learn a shared latent space embedding between image features and attributes and thereby achieve state-of-the-art results in generalized zero-shot learning.
- Abstract: Most approaches in generalized zero-shot learning rely on cross-modal mapping between an image feature space and a class embedding space or on generating artificial image features. However, learning a shared cross-modal embedding by aligning the latent spaces of modality-specific autoencoders is shown to be promising in (generalized) zero-shot learning. While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier. We evaluate our learned latent features on conventional benchmark datasets and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.