Semi-supervised cross-modal representation learning with GAN-based Asymmetric Transfer Network

Lei Zhang, Leiting Chen, Weihua Ou, Chuan Zhou

2020 (modified: 08 Nov 2021)J. Vis. Commun. Image Represent. 2020Readers: Everyone

Abstract: Highlights • Asymmetric mappings can better model the differences in low level features. • Transfer learning mechanism ensures the training efficiency of cross modal retrieval. • Semantic constraint divides samples into several semantically discriminative clusters. • Adversarial learning ensures the alignment of distributions from two modalities. Abstract In this paper, we proposed a semi-supervised common representation learning method with GAN-based Asymmetric Transfer Network (GATN) for cross modality retrieval. GATN utilizes the asymmetric pipeline to guarantee the semantic consistency and adopt (Generative Adversarial Network) GAN to fit the distributions of different modalities. Specifically, the common representation learning across modalities includes two stages: (1) the first stage, GATN trains source mapping network to learn the semantic representation of text modality by supervised method; and (2) the second stage, GAN-based unsupervised modality transfer method is proposed to guide the training of target mapping network, which includes generative network (target mapping network) and discriminative network. Experimental results on three widely-used benchmarks show that GATN have achieved better performance comparing with several existing state-of-the-art methods.

0 Replies