TL;DR: We propose a novel method called Deep Unsupervised Hashing via External Guidance (DUH-EG), which incorporates external textual knowledge as semantic guidance to enhance discrete representation learning.
Abstract: Recently, deep unsupervised hashing has gained considerable attention in image retrieval due to its advantages in cost-free data labeling, computational efficiency, and storage savings. Although existing methods achieve promising performance by leveraging inherent visual structures within the data, they primarily focus on learning discriminative features from unlabeled images through limited internal knowledge, resulting in an intrinsic upper bound on their performance. To break through this intrinsic limitation, we propose a novel method, called Deep Unsupervised Hashing with External Guidance (DUH-EG), which incorporates external textual knowledge as semantic guidance to enhance discrete representation learning. Specifically, our DUH-EG: i) selects representative semantic nouns from an external textual database by minimizing their redundancy, then matches images with them to extract more discriminative external features; and ii) presents a novel bidirectional contrastive learning mechanism to maximize agreement between hash codes in internal and external spaces, thereby capturing discrimination from both external and intrinsic structures in Hamming space. Extensive experiments on four benchmark datasets demonstrate that our DUH-EG remarkably outperforms existing state-of-the-art hashing methods.
Lay Summary: In the digital world, quickly finding the right image from a huge collection is a big challenge. One way to do this is by using short binary codes (called “hash codes”) that help computers search faster. Many recent methods create these codes by training models directly on images without the need for human-provided labels, which saves both time and effort. However, their performance is often limited because they rely solely on the visual information within the images. To address this, we propose a novel method, called Deep Unsupervised Hashing with External Guidance (DUH-EG). Specifically, we use nouns from an external textual database to help the model better understand the content of images. By effectively compare and integrate what the model “sees” in the images with what it “knows” from nouns, it can generate more accurate and useful codes. We test our DUH-EG method on four well-known image datasets, and it clearly do better than the best methods currently available.
Link To Code: https://github.com/XLearning-SCU/2025-ICML-DUHEG
Primary Area: General Machine Learning->Representation Learning
Keywords: Deep Unsupervised Hashing, External Guidance, Contrastive Learning, Image Retrieval
Submission Number: 9106
Loading