LIRA: A Learning-based Query-aware Partition Framework for Large-scale ANN Search

Ximu Zeng; Liwei Deng; Penghao Chen; Xu Chen; Han Su; Kai Zheng

LIRA: A Learning-based Query-aware Partition Framework for Large-scale ANN Search

Ximu Zeng, Liwei Deng, Penghao Chen, Xu Chen, Han Su, Kai Zheng

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Search and retrieval-augmented AI

Keywords: Approximate nearest neighbor search, Learning-to-index

Abstract: Approximate nearest neighbor (ANN) search is fundamental in various applications such as information retrieval. To enhance efficiency, partition-based methods are proposed to narrow the search space by probing partial partitions, yet they face two common issues. First, in the query phase, a widely adopted strategy in existing studies such as IVF is to probe partitions based on the distance ranks of a query to partition centroids. This inevitably leads to irrelevant partition probing, since data distribution is not considered. Second, in the partition construction phase, all the partition-based methods have the boundary problem that separates a query's $k$NN to multiple partitions and produces a long-tailed $k$NN distribution, degrading the optimal $nprobe$ (i.e., the number of probing partitions) and the search efficiency. To address these problems, we propose LIRA, a LearnIng-based queRy-aware pArtition framework. Specifically, we propose a probing model to learn and directly probe the partitions containing the $k$NN of a query. Probing partitions with the model can reduce probing waste and allow for query-aware probing with query-specific $nprobe$. Moreover, we incorporate the probing model into a learning-based redundancy strategy to mitigate the adverse impact of the long-tailed $k$NN distribution on partition probing. Extensive experiments on real-world vector datasets demonstrate the superiority of LIRA in the trade-off among accuracy, latency, and query fan-out. The results show that LIRA consistently reduces the latency and the query fan-out up to 30\%.

Submission Number: 1603

Loading