Keywords: nearest neighbor search, filtered search, graph indices, filters
TL;DR: We study graph based algorithms for filtered nearest neighbor search for many filters, with provable guarantees
Abstract: We study nearest neighbor search with filter constraints (MultiFilterANN): given a query vector with a discrete set of labels $S$, retrieve the (approximately) closest vector from a dataset under the constraint that $S$ must be a subset of the labels of the retrieved vector. There has been a burgeoning interest in this problem on the practical side, due to its strong motivation from search and recommendation applications where vector labels correspond to real world attributes such as date, price, or color. On the theoretical side, this problem generalizes the subset query problem, which asks us to only determine if $S$ is a subset of some set in the dataset, without retrieving the closest vector.
In this work, we present a systematic study of MultiFilterANN,. Theoretically, we demonstrate the power of graph-based algorithms in two ways:
- We design provable algorithms with the best known space-time tradeoffs for \mfann in the large filter regime by carefully incorporating ANN algorithms into known subset query algorithms.% to incorporate nearest neighbor search using graph-based algorithms.
- We demonstrate lower bounds for popular algorithms for MultiFilterANN, showing that they can catastrophically fail even on simple data/label sets.
Our theoretical results inspire our empirical approach, where we extend practical graph indices for standard nearest neighbor search to MultiFilterANN by augmenting the (greedy) search procedure with a penalized distance function that captures filter constraints. Our empirical algorithm is competitive with existing state of the art solutions which are tailored for one or two filters, while also seamlessly generalizing to any number of filters without any modifications. Lastly we release multiple novel datasets for MultiFilterANN, filling in a noticeable gap in literature.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8377
Loading