A Multilabel Classification Framework for Approximate Nearest Neighbor Search

Ville Oskari Hyvönen; Elias Jääsaari; Teemu Roos

A Multilabel Classification Framework for Approximate Nearest Neighbor Search

Ville Oskari Hyvönen, Elias Jääsaari, Teemu Roos

Published: 31 Oct 2022, Last Modified: 15 Dec 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: approximate nearest neighbor search, multilabel classification, statistical learning theory

Abstract: Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates can be retrieved by naive lookup or backtracking search. We formulate candidate set selection in ANN search directly as a multilabel classification problem where the labels correspond to the nearest neighbors of the query point, and interpret the partitions as partitioning classifiers for solving this task. Empirical results suggest that the natural classifier based on this interpretation leads to strictly improved performance when combined with any unsupervised or supervised partitioning strategy. We also prove a sufficient condition for consistency of a partitioning classifier for ANN search, and illustrate the result by verifying this condition for chronological $k$-d trees.

TL;DR: We formulate approximate nearest neighbor search as a multilabel classification problem and provide a sufficient condition for consistency of partitioning classifiers under this formulation.

Supplementary Material: zip

10 Replies

Loading