Cross-Modal Hash Retrieval with Category Semantics

Published: 01 Jan 2024, Last Modified: 27 Jan 2025MMM (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the multi-medal resource springing up, the field of cross-modal research has witnessed rapid advancement. Among these, Cross-Modal Hash retrieval has attracted attention due to its time-saving capabilities during the retrieval process. However, most existing cross-modal hash retrieval methods focus on capturing the relationship between modalities and often train in a one-to-one correspondence manner. Consequently, they often neglect the comprehensive distribution of the entire dataset. In response, we propose a novel approach that introduces semantic relationships between categories to capture class relationships and enable the measurement of the full distribution. First, the original image and text are encoded to obtain respective representations. Then, these are fused to generate a joint representation, which is passed through a classification head to obtain category semantics. The semantic information is used to allocate hash centers from a Hadamard matrix. Meanwhile, the image and text representations are fed into a hash layer to produce hash codes. We train the network with a modal matching loss for alignment, as well as center and class loss to capture the full data distribution. Extensive evaluations on two benchmark databases demonstrate that our approach achieves competitive performance compared to state-of-the-art methods.
Loading