Abstract: Deep hashing technology has recently become an essential tool for cross-modal retrieval on large-scale datasets. However, their performances heavily depend on accurate annotations to train the hashing model. In real applications, we usually only obtain low-quality label annotations owing to labor and time consumption limitations. To mitigate the performance degradation caused by noisy labels, in this paper, we propose a robust deep hashing method, called deep hashing with ranking learning (DHRL), for cross-modal retrieval. The proposed DHRL method consists of a refined semantic concept alignment module and a ranking-swapping module. In this first module, we adopt two transformers to perform the semantic alignment tasks between different modalities on a set of refined concepts, and then convert them into hash codes to reduce heterogeneous differences between multimodalities. The second module first identifies the noisy labels in the training set and ranks them according to ranking loss. Then it swaps the ranking information of different modal network branches. Unlike existing robust hashing methods for assuming noise distribution, our proposed DHRL method requires no prior assumptions for the input data. Extensive experiments on three benchmark datasets have shown that our proposed DHRL method has stronger advantages over other state-of-the-art hashing methods.
Loading