Semantic supervised learning based Cross-Modal Retrieval

Zhuoyi Li, Hao Fu, Guanghua Gu

Published: 2024, Last Modified: 13 Nov 2024ACM TUR-C 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Although several cross-modal retrieval approaches have achieved significant progress over the past few decades, it is still challenging due to the difficulty in bridging the heterogeneity gap among different modalities, especially under the context of domain shift. To fully capture the semantic consistency in the limited supervision information, we introduce the semantic supervised learning strategies for cross-modal retrieval. In this paper, a feature enhancement module is designed to fully explore the information of heterogeneous samples. To be specific, we design an information optimization enhancement module for each modality and a bidirectional learning network with the training embedding. The bidirectional learning network implements a bidirectional learning loss to minimize the semantic gap between the representations of the domain and the forward domain.