Keywords: Approximate Nearest Neighbor, Hierarchical Navigable Small Word(HNSW), Vector Quantization, Locally Adaptive Quantization(LVQ), Vector Search, Large Language Models, Retrieval-Augmented Generation(RAG), Scalable Indexing, Memory-Efficient Search, Low-Latency Vector Databases
Abstract: Graph-based approximate nearest neighbor search, specifically Hierarchical Navi-
gable Small World (HNSW), remains the standard for low-latency vector retrieval.
However, as datasets grow to millions of high-dimensional embeddings, the RAM
requirements for full-precision (float32) indices become prohibitive. While Scalar
Quantization (SQ) can reduce this footprint, naive min-max scaling often fails in
practice: a handful of outliers can stretch the quantization bins, causing “collapse”
where useful data distinctions are lost. We propose LAVQ (Locally Adaptive
Vector Quantization), a modification to HNSW that employs a percentile-based
clipping strategy. By dynamically adapting quantization bounds per dimension,
LAVQ ignores statistical outliers to preserve fidelity in the dense regions of the
vector space. We further accelerate search using custom AVX2 integer intrinsics.
On the SIFT1M benchmark, LAVQ cuts memory usage by 3.8× and improves
query throughput (QPS) by 4.4× over float32 baselines, achieving recall compa-
rable to state-of-the-art implementations like FAISS.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 19994
Loading