Cross-Modal Retrieval with Heterogeneous Graph Embedding

Dapeng Chen, Min Wang, Haobin Chen, Lin Wu, Jing Qin, Wei Peng

2022 (modified: 04 Nov 2022)ACM Multimedia 2022Readers: Everyone

Abstract: Conventional methods address the cross-modal retrieval problem by projecting the multi-modal data into a shared representation space. Such a strategy will inevitably lose the modality-specific information, leading to decreased retrieval accuracy. In this paper, we propose heterogeneous graph embeddings to preserve more abundant cross-modal information. The embedding from one modality will be compensated with the aggregated embeddings from the other modality. In particular, a self-denoising tree search is designed to reduce the "label noise" problem, making the heterogeneous neighborhood more semantically relevant. The dual-path aggregation tackles the "modality imbalance" problem, giving each sample comprehensive dual-modality information. The final heterogeneous graph embedding is obtained by feeding the aggregated dual-modality features to the cross-modal self-attention module. Experiments conducted on cross-modality person re-identification and image-text retrieval task validate the superiority and generality of the proposed method.

0 Replies