Modeling SRP-LSH Performance: A Theoretical Framework for Optimizing Approximate Nearest Neighbor Search
Keywords: Approximate Nearest Neighbor Search; Locality-Sensitive Hashing; Theoretical Analysis; Parameter Optimization; High-Dimensional Retrieval
TL;DR: We present a theoretical framework that rigorously models SRP-LSH performance and enables principled parameter configuration.
Abstract: Approximate nearest neighbor (ANN) search in high-dimensional spaces with sign-random-projection locality-sensitive hashing (SRP-LSH) remains challenging due to the lack of principled approaches for configuring its key parameters. We present a theoretical framework that rigorously models SRP-LSH performance and enables principled parameter configuration. At its core is, to our knowledge, the first analytical model that links the number of hash functions and the Hamming distance threshold to search recall, rooted in the binomial distribution of bit collisions and the angular similarity distribution of vectors. Building upon this model, we develop an adaptive optimization algorithm that minimizes the candidate set size while satisfying user-specified recall targets. Extensive experiments show that
our model typically predicts recall with a mean absolute percentage error (MAPE) below 5%. Moreover, our algorithm consistently meets the specified recall targets and simultaneously captures global selectivity trend. Overall, this framework provides a theoretically grounded and practical solution for configuring SRP-LSH in real-world retrieval systems.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 6930
Loading