SoftHash: High-dimensional Hashing with A Soft Winner-Take-All Mechanism

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: metric learning, kernel learning, and sparse coding
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Locality-sensitive hashing, Sparse expansive representations, Hebb rule, Winner-take-all, Image retrieval, Word similarity search
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Locality-Sensitive Hashing (LSH) is a classical algorithm that aims to hash similar data points into the same bucket with high probability. Inspired by the fly olfactory system, one variant of the LSH algorithm called $\textit{FlyHash}$, assigns hash codes into a high-dimensional space, showing great performance for similarity search. However, the semantic representation capability of $\textit{FlyHash}$ is not yet satisfactory, since it is a data-independent hashing algorithm, where the projection space is constructed randomly, rather than adapted to the input data manifold. In this paper, we propose a data-dependent hashing algorithm named $\textit{SoftHash}$. In particular, $\textit{SoftHash}$ is motivated by the bio-nervous system that maps the input sensory signals into a high-dimensional space, to improve the semantic representation of hash codes. We learn the hashing projection function using a Hebbian-like learning rule coupled with the idea of Winner-Take-All (WTA). Specifically, the synaptic weights are updated solely based on the activities of pre- and post-synaptic neurons. Unlike the previous works that adopt the hard WTA rule, we introduce a soft WTA rule, whereby the non-winning neurons are not fully suppressed in the learning process. This allows weakly correlated data to have a chance to be learned to generate more representative hash codes. We conduct extensive experiments on six real-world datasets for tasks including image retrieval and word similarity search. The experimental results demonstrate that our method significantly outperforms these baselines in terms of data similarity search accuracy and speed.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2335
Loading