TL;DR: In this work, we enhance the theoretical guarantees of beam search for graph-based ANN methods.
Abstract: Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the **worst-case** performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of $\alpha$-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an $\left(\frac{\alpha+1}{\alpha-1}\right)$-approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows:
- We introduce **sorted** $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics.
- We present the **first** worst-case theoretical analysis for the popular **beam-search** algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors.
We also present empirical results validating the significance of sorted $\alpha$-reachable graphs, which aligns with our theoretical findings.
Lay Summary: In recent years, graph-based data structures have become a popular and performant option for vector search. However, theoretical understanding of these structures has remained far behind their practical performance. Prior work has provided strong theory for a simplified version of these data structures.
Our findings bring the theoretical understanding of these data structures closer to the practical versions: by adding some additional structure (which is motivated by commonly used modifications in-practice) to the problem, we are able to show better theoretical guarantees. In addition, we prove results for when a query requires more than one vector.
Primary Area: Theory
Keywords: Nearest neighbor search
Submission Number: 10485
Loading