Abstract: Cross-modal hashing is favored in the field of large-scale cross-modal retrieval for its fast query and low storage cost. Existing cross-modal hashing methods are typically based on an idealized assumption that data from different modalities are fully and completely paired. However, in practical scenarios, the occurrence of missing modal data is common due to technical difficulties, hardware failures, and challenges in data collection, contradicting the aforementioned assumption. In this paper, we propose an innovative unsupervised cross-modal hashing frame-work, named Semantic Reconstruction Guided Missing Hashing (SRGMH). Specifically, we utilize Dual-Variational Autoencoders (D-VAEs) to map features of different modalities into a shared low-dimensional latent representation, better bridging the gaps between modalities. Notably, we generate pseudo-representations of other modalities corresponding to the missing modality, effectively solving the problem of missing modal data. Moreover, we construct a refined adjacency similarity matrix to align the latent representations of different modalities, enhancing the quality of completed data and ensuring semantic consistency across modalities. Finally, we embed the adjacency similarity matrices in a shared latent representation space to jointly learn hash functions and hash codes, which enriches the semantic content and maintains structural similarity between hash functions and codes. Experiments on three real-world datasets demonstrate the superior performance of the proposed method on both retrieval accuracy and efficiency.
Loading