IVF$^{2}$ Index: Fusing Classic and Spatial Inverted Indices for Fast Filtered ANNS

Published: 12 Jun 2025, Last Modified: 06 Jul 2025VecDB 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: vector search, ANNS, ANN, similarity search, approximate nearest neighbor search
TL;DR: A novel index for ANNS queries requiring the intersection of metadata labels.
Abstract: The rise of metric embeddings as a crucial tool in search, recommendation, and large language model applications has created significant interest in complex search queries over vectors, such as restricted vector search based on per-vector metadata (``filtered ANNS''). The NeurIPS'23 BigANN competition's Filter track evaluated submissions based on query throughput above a target level of recall on a 10M vector dataset, with binary per-vector metadata (labels), and with query predicates requiring either one or two specified labels to be present for all vectors returned. Existing state of the art approaches for filtered ANNS struggle to perform such `AND' queries, which require the returned vectors to have all of a set of specified binary labels. Perhaps surprisingly, we find that a more combinatorial view of the problem leads to highly efficient solutions, approaching and sometimes even exceeding the throughput of unfiltered search on the full dataset. We present the IVF$^2$ index, a novel approach to indexing vectors to serve these queries which leverages classical and inverted file indices in tandem to dramatically reduce the number of vectors needing to be considered before comparing any of them to the query vector. We demonstrate empirically strong results on the competition dataset, exceeding the throughput of the runner-up submission by a factor of 1.97x and the organizer provided baseline by a factor of 11.58x.
Submission Number: 10
Loading