An Efficient Framework for Approximate Nearest Neighbor Search on High-Dimensional Multi-metric Data
Abstract: Many objects are often multi-modal data consisting of images, videos, documents, etc., where each model exists in a different metric space. Similarity searches on multi-modal data are widely employed in information retrieval and machine learning. In these applications, each modal is usually represented as dense high-dimensional data (e.g., an embedding vector). This paper addresses the problem of nearest neighbor search (NNS) on high-dimensional multi-metric data. Although some techniques for NNS on multi-metric data exist, they do not consider high-dimensional data and query-dependent weights in multi-metric spaces. A straightforward yet fast algorithm for “exact” NNS on high-dimensional multi-metric data is linear scan, but obtaining exact results is slow on large datasets. We therefore propose an efficient framework for approximate NNS on high-dimensional multi-metric data. For each metric space, we build the same type of proximity graph that allows any search-start node. An approximate NN is found by traversing the proximity graphs while carefully selecting start nodes to avoid redundant node accesses. We conduct experiments on real-world high-dimensional multi-metric data, and the results demonstrate that our framework outperforms state-of-the-art algorithms.
Loading