Domain-aware multi-modality fusion network for generalized zero-shot learning

Jia Wang, Xiao Wang, Han Zhang

Published: 01 Jan 2022, Last Modified: 15 May 2025Neurocomputing 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Generalized zero-shot learning (GZSL) is a challenging problem which aims to recognize images from both seen and unseen classes. Existing research suffers from the bias problem, which means that the model tends to misclassify an unseen sample to seen classes. Moreover, recent methods mainly focus on using a single semantic representation for knowledge transfer (e.g., attributes). Although some try to utilize multiple information, they only use simple concatenation or transformations and the performance is limited. To solve GZSL problem, we propose a two-step method aimed at overcoming these two challenges progressively. Firstly, a local neighborhood based gating model is designed to leverage both the distribution of original data space and a learned latent space for domain detection. The model is used to separate seen and unseen samples, and then decompose GZSL into a conventional zero-shot learning (ZSL) problem and a supervised classification problem. Then, we design a graph convolutional network (GCN) based model for fusing multiple semantic modalities to promote the solution of the decomposed ZSL problem. By using one primary modality as input and another for construction of node relationships, our model is able to fuse multiple information effectively and helps to learn more discriminative visual classifiers. We test our method, local neighborhood based domain aware and GCN based multi-modality fusion network (LND-GMF) on five benchmark datasets. The results show that our method out-performs state-of-the-art methods with a large margin.