Hierarchical Epsilon-Net Graphs: Time Guarantees for HNSW in Approximate Nearest Neighbor Search

ICLR 2026 Conference Submission21454 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Similarity Search, Approximate Nearest Neighbor Search, Retrieval, Epsilon Nets, Theoritical Time Guarantee
TL;DR: We define HENN, a general property of hierarchical graph-based structures for ANN, based on $\varepsilon$-net thoery. Using this, we provide probabilistic time guarantees for HNSW and similar indecies.
Abstract: Hierarchical graph-based algorithms such as HNSW achieve state-of-the-art performance for Approximate Nearest Neighbor (ANN) search in practice, but they often lack theoretical guarantees on query time or recall due to their heavy use of randomized heuristic constructions. In contrast, existing theoretically grounded structures are typically difficult to implement and struggle to scale in real-world scenarios. We introduce a property of hierarchical graphs called Hierarchical $\varepsilon$-Net Navigation (HENN), grounded in $\varepsilon$-net theory from computational geometry. This framework allows us to establish time bounds for ANN search on graphs that satisfy the HENN property. The design of HENN is agnostic to the underlying proximity graph used at each layer, treating it as a black box. We further show that HNSW satisfies the HENN property with high probability, enabling us to derive formal time guarantees for HNSW. Constructing a HENN graph relies on finding $\varepsilon$-nets. Existing methods for finding $\varepsilon$-nets are either probabilistic or, when deterministic, become impractical in high dimensions. To address this, we propose a budget-aware practical algorithm for building $\varepsilon$-nets, under a user-specified preprocessing time budget. Empirical evaluations confirm our theoretical guarantees for both HENN and HNSW, and demonstrate the effectiveness of the proposed budget-aware algorithm for constructing HENN and, more generally, $\varepsilon$-nets. This flexibility allows practitioners to select the method that best fits their specific use case.
Primary Area: learning theory
Submission Number: 21454
Loading