Keywords: Approximate Nearest Neighbor, Hierarchical Navigable Small Word(HNSW), Vector Quantization, Locally Adaptive Quantization(LVQ), Vector Search, Large Language Models, Retrieval-Augmented Generation(RAG), Scalable Indexing, Memory-Efficient Search, Low-Latency Vector Databases
Abstract: This paper presents a novel optimization strategy for
high-performance approximate nearest neighbor (ANN) search, a
critical requirement in modern vector search applications driven
by large language models and retrieval-augmented generation.
Addressing the inherent memory and latency challenges of the
popular Hierarchical Navigable Small World (HNSW) algorithm,
we introduce HNSW-LVQ (Locally Adaptive Vector Quantization
for HNSW). Our methodology incorporates a per-dimension
quantization scheme that efficiently compresses floating-point
vectors into integer representations, thereby significantly reducing
memory overhead and accelerating distance computations.
Empirical validation on the SIFT 10K dataset demonstrates
that HNSW-LVQ achieves a remarkable 85% reduction in
query latency and substantial memory enhancement with only
a marginal 2% decrease in recall. This research validates the
efficacy of integrating quantization techniques into graph-based
indexing, offering a pragmatic optimization pathway for the
development of industrial-grade vector databases.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Supplementary Material: zip
Submission Number: 19994
Loading