Simple Yet Efficient Locality Sensitive Hashing with Theoretical Guarantee

Zongyuan Tan; Hongya Wang; Bo Xu; Minjie Luo; Ming Du

Simple Yet Efficient Locality Sensitive Hashing with Theoretical Guarantee

Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo, Ming Du

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Locality-sensitive hashing, random sampling, machine learning

Abstract: Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks such as outlier detection, neural network training and nearest neighbor search. The cost of hashing is the main performance bottleneck of these applications because the index construction functionality, a core component dominating the end-to-end latency, involves the evaluation of a large number of hash functions. Surprisingly, however, little work has been done to improve the efficiency of LSH computation. In this paper, we design a simple yet efficient LSH scheme, named FastLSH, by combining random sampling and random projection. FastLSH reduces the hashing complexity from $O(n)$ to $O(m)$ ($m<n$), where $n$ is the data dimensionality and $m$ is the number of sampled dimensions. More importantly, FastLSH has provable LSH property, which distinguishes it from the non-LSH fast sketches. To demonstrate its broad applicability, we conduct comprehensive experiments over three machine learning tasks, i.e., outlier detection, neural network training and nearest neighbor search. Experimental results show that algorithms powered by FastLSH provides up to 6.1x, 1.7x and 20x end-to-end speedup in anomaly detection latency, training time and index construction, respectively. The source code is available at https://anonymous.4open.science/r/FastLSHForMachineLearning-7CAC.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6094

Loading