Abstract: With the massive emergence of multi-modal data, cross-modal retrieval (CMR) has become one of the hot topics. Thanks to fast retrieval and efficient storage, cross-modal hashing (CMH) provides a feasible solution for large-scale multi-modal data. Previous CMH methods always directly learn common hash codes to fuse different modalities. Although they have obtained some success, there are still some limitations: 1) These approaches often prioritize reducing the heterogeneity in multi-modal data by learning consensus hash codes, yet they may sacrifice specific information unique to each modality. 2) They frequently utilize pairwise similarities to guide hashing learning, neglecting class distribution correlations, which do not notably contribute to reducing differences among various modalities. To overcome these two issues, we propose a novel Distribution Consistency Guided Hashing (DCGH) framework. Specifically, we first learn the modality-specific representation to extract the private discriminative information. Further, we learn consensus hash codes from the private representation by consensus hashing learning, thereby merging the specifics with consistency. Finally, we propose distribution consistency learning to guide hash codes following a similar class distribution principle between multi-modal data, thereby exploring more consistent information. Lots of experimental results on four benchmark datasets demonstrate the effectiveness of our DCGH on both fully paired and partially paired CMR tasks.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Secondary Subject Area: [Content] Multimodal Fusion, [Experience] Multimedia Applications
Relevance To Conference: Cross-modal image -text retrieval is a significant area within multimedia/multimodal processing. The contribution of cross-modal retrieval to multimedia/multimodal processing lies in its ability to bridge the semantic gap between different modalities, enabling enhanced understanding, retrieval, knowledge transfer, and user experience across various applications and domains.
Supplementary Material: zip
Submission Number: 131
Loading