Abstract: Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for high-recall search, which makes them expensive and limits the size of the database. We present a new graph-based indexing and search algorithm called Rand-NSG that can index, store, and search a billion point database on a single workstation with just 64 GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by Rand-NSG can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (base points per node). On the billion point SIFT bigann dataset, Rand-NSG serves > 5000 queries a second with < 3 ms mean latency and 95%+ 1 -recall @1 , where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC+G+P plateau at around 50% 1 -recall @1 . Alternately, in the high recall regime, Rand-NSG can index and serve up to 10x more points per node compared to state-of-the-art graph-based methods such as NSG and HNSW. Moreover, Rand-NSG matches the best-in-class in-memory solutions in terms of recall vs latency tradeoff while requiring lesser indexing resources.
Code Link: http://harsha-simhadri.org/pubs/diskann/index.html
CMT Num: 7667
0 Replies
Loading