Abstract: The success of current deep cross-modal hashing admits a default assumption of the fully-observed cross-modal data. However, such a rigorous common policy is hardly guaranteed for practical large-scale cases, which directly disable the training of prevalent cross-modal retrieval methods with incomplete cross-modal instances and unpaired relations. The main challenges come from the collapsed semantic- and modality-level similarity learning as well as uncertain cross-modal correspondence. In this paper, we propose a Contrastive Incomplete Cross-modal Hashing (CICH) network, which simultaneously determines the cross-modal semantic coordination, unbalanced similarity calibration, and contextual correspondence alignment. Specifically, we design a prototypical semantic similarity coordination module to globally rebuild partially-observed cross-modal similarities under an asymmetric learning scheme. Meanwhile, a semantic-aware contrastive hashing module is established to adaptively perceive and remedy the unbalanced similarities across different modalities with the semantic transition for generating discriminative hash codes. Additionally, a contextual correspondence alignment module is conceived to maximally capture shared knowledge across modalities and eliminate the correspondence uncertainty via a dual contextual information bottleneck formula. To the best of our knowledge, this is the first successful attempt of enabling contrastive learning to incomplete deep cross-modal hashing. Extensive experiments validate the superiority of our CICH against state-of-the-art methods.
Loading