Cross-modal discriminant adversarial network

Peng Hu, Xi Peng, Hongyuan Zhu, Jie Lin, Liangli Zhen, Wei Wang, Dezhong Peng

2021 (modified: 14 Dec 2021)Pattern Recognit. 2021Readers: Everyone

Abstract: Highlights • In this paper, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN) to learn a latent discriminant space for cross-modal data, which is with a novel network structure and a novel learning mechanism (CDM). In brief, CDM projects the generated features of all modalities into a latent common space and gives the positive/negative feedback to adversarial learning. Therefore, our method could reduce the modality discrepancy, while preserving the discriminative information into the common space. • To improve our CDM, a novel objective function is presented to learn the common space in which the within-class samples should be compacted and the betweenclass samples should be scattered. Furthermore, the transformations of the CDM can be analytically solved from the generated features, thus escaping from the trap of local minimal. • To avoid the trivial solutions of directly optimizing the CDM objective function, a novel logarithmic eigenvalue-based loss is proposed. Another advantage of the proposed loss is that it could push as much discrimination as possible into all latent directions of CDM transformations instead of only the dominant ones. Preprint submitted Abstract Cross-modal retrieval aims at retrieving relevant points across different modalities, such as retrieving images via texts. One key challenge of cross-modal retrieval is narrowing the heterogeneous gap across diverse modalities. To overcome this challenge, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN). Taking bi-modal data as a showcase, CAN consists of two parallel modality-specific generators, two modality-specific discriminators, and a Cross-modal Discriminant Mechanism (CDM). To be specific, the generators project diverse modalities into a latent cross-modal discriminant space. Meanwhile, the discriminators compete against the generators to alleviate the heterogeneous discrepancy in this space, i.e., the generators try to generate unified features to confuse the discriminators, and the discriminators aim to classify the generated results. To further remove the redundancy and preserve the discrimination, we propose CDM to project the generated results into a single common space, accompanying with a novel eigenvalue-based loss. Thanks to the eigenvalue-based loss, CDM could push as much discriminative power as possible into all latent directions. To demonstrate the effectiveness of our CAN, comprehensive experiments are conducted on four multimedia datasets comparing with 15 state-of-the-art approaches.

0 Replies