Quantization-Enhanced HNSW for Scalable Approximate Vector Search

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Approximate Nearest Neighbor, Hierarchical Navigable Small Word(HNSW), Vector Quantization, Locally Adaptive Quantization(LVQ), Vector Search, Large Language Models, Retrieval-Augmented Generation(RAG), Scalable Indexing, Memory-Efficient Search, Low-Latency Vector Databases
Abstract: Graph-based approximate nearest neighbor search, specifically Hierarchical Navi- gable Small World (HNSW), remains the standard for low-latency vector retrieval. However, as datasets grow to millions of high-dimensional embeddings, the RAM requirements for full-precision (float32) indices become prohibitive. While Scalar Quantization (SQ) can reduce this footprint, naive min-max scaling often fails in practice: a handful of outliers can stretch the quantization bins, causing “collapse” where useful data distinctions are lost. We propose LAVQ (Locally Adaptive Vector Quantization), a modification to HNSW that employs a percentile-based clipping strategy. By dynamically adapting quantization bounds per dimension, LAVQ ignores statistical outliers to preserve fidelity in the dense regions of the vector space. We further accelerate search using custom AVX2 integer intrinsics. On the SIFT1M benchmark, LAVQ cuts memory usage by 3.8× and improves query throughput (QPS) by 4.4× over float32 baselines, achieving recall compa- rable to state-of-the-art implementations like FAISS.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 19994
Loading