Benchmarking Approximate k-Nearest Neighbour Search for Big High Dimensional Dynamic Data

Ben Harwood; Amir Dezfouli; Iadine Chades

Benchmarking Approximate k-Nearest Neighbour Search for Big High Dimensional Dynamic Data

Ben Harwood, Amir Dezfouli, Iadine Chades

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Nearest Neighbour Search, Similarity Search, Indexing, Knowledge retrieval, Knowledge discovery, High Dimensional Data, Big Data, Large Scale, Hashing, Graph Traversal, Product Quantisation, Online Learning, Representation Learning, Metric Learning, Robotic Vision

TL;DR: A novel framework for benchmarking Approximate k-Nearest Neighbour (ANN) methods on big high dimensional dynamic data that identifies suitable ANN methods for ML and other applications and will accelerate future ANN research.

Abstract: Approximate k-Nearest Neighbour (ANN) methods are commonly used for mining information from big high-dimensional datasets. For each application the high-level dataset properties and run-time requirements determine which method will provide the most suitable tradeoffs. However, due to a significant lack of comprehensive benchmarking, judicious method selection is not currently possible for ANN applications that involve frequent online changes to datasets. Here we address this issue by building upon existing benchmarks for static search problems to provide a new benchmarking framework for big high dimensional dynamic data. We apply our framework to dynamic scenarios modelled after common real world applications. In all cases we are able to identify a suitable recall-runtime tradeoff to improve upon a worst-case exhaustive search. Our framework provides a flexible solution to accelerate future ANN research and enable researchers in other online data-rich domains to find suitable methods for handling their ANN searches.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)

Supplementary Material: zip

11 Replies

Loading