Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space

Published: 01 Jan 2016, Last Modified: 25 May 2024CoRR 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We study the $r$-near neighbors reporting problem ($r$-NN), i.e., reporting \emph{all} points in a high-dimensional point set $S$ that lie within a radius $r$ of a given query point $q$. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving $r$-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for $r$-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.
Loading