Abstract: Retrieval-Augmented Generation (RAG) systems typically rely on dense floating-point embeddings to retrieve relevant documents, but this approach incurs significant memory and compute costs at scale. We propose a Hyperdimensional Computing (HDC) framework that projects transformer token embeddings into high-dimensional binary hypervectors, which are aggregated into compact document representations. To support sublinear search, we introduce HD-NSW, a graph-based index inspired by navigable small-world networks. HD-NSW clusters similar hypervectors into bundled centroids and connects them with sparse Hammingdistance edges, enabling efficient, beam-guided traversal entirely in the binary domain. Across 15 BEIR benchmarks and synthetic Gaussian mixture corpora, HD-NSW achieves over 99% of dense retrieval quality, reduces memory usage by $8 \times$, and supports over 860 queries per second at 10 million documents while maintaining over 80% throughput at 40 million documents. At five million documents, HD-NSW achieves $7.68 \times$ higher throughput compared to state of the art approximate nearest neighbor methods. Beyond this point, competing baselines encounter memory exhaustion, while HD-NSW continues scaling and maintains high throughput at larger corpus sizes.
External IDs:dblp:conf/IEEEpact/LeeJGPK25
Loading