Cluster Tree for Nearest Neighbor Search

TMLR Paper3667 Authors

11 Nov 2024 (modified: 15 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Tree-based algorithms are an important and widely used class of algorithms for Nearest Neighbor Search (NNS) with random partition (RP) tree being arguably the most well studied. However, in spite of possessing theoretical guarantees and strong practical performance, a major drawback of the RP tree is its lack of adaptability to the input dataset. Inspired by recent theoretical and practical works for NNS, we attempt to remedy this by introducing *ClusterTree*, a new tree based algorithm. Our approach utilizes randomness as in RP trees while adapting to the underlying cluster structure of the dataset to create well-balanced and meaningful partitions. Experimental evaluations on real world datasets demonstrate improvements over RP trees and other tree based methods for NNS while maintaining efficient construction time. In addition, we show theoretically and empirically that *ClusterTree* finds partitions which are superior to those found by RP trees in preserving the cluster structure of the input dataset.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Arto_Klami1
Submission Number: 3667
Loading