Cluster Tree for Nearest Neighbor Search

Dan Kushnir; Sandeep Silwal

Cluster Tree for Nearest Neighbor Search

Dan Kushnir, Sandeep Silwal

Published: 21 Mar 2025, Last Modified: 21 Mar 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Tree-based algorithms are an important and widely used class of algorithms for Nearest Neighbor Search (NNS) with random partition (RP) tree being arguably the most well studied. However, in spite of possessing theoretical guarantees and strong practical performance, a major drawback of the RP tree is its lack of adaptability to the input dataset. Inspired by recent theoretical and practical works for NNS, we attempt to remedy this by introducing *ClusterTree*, a new tree based algorithm. Our approach utilizes randomness as in RP trees while adapting to the underlying cluster structure of the dataset to create well-balanced and meaningful partitions. Experimental evaluations on real world datasets demonstrate improvements over RP trees and other tree based methods for NNS while maintaining efficient construction time. In addition, we show theoretically and empirically that *ClusterTree* finds partitions which are superior to those found by RP trees in preserving the cluster structure of the input dataset.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We thank again the reviewers for their excellent remarks that improved our manuscripts during the rebuttal period. We also tahnk the editor in action for providing some time extension to be able to finalize a revision that covers all the points that the reviewers have raised. We provide below a summary of the modifications we introduced following the requests of the reviewers, and we edited our responses to each of them accordingly. This submission includes the following modifications: 1. We updated the contribution outline as per reviewer 2srD request 2. We brought into the body of the paper the projection-number experiment, that was part of the supplementary material, to show the ablation study on the efficiency of ClusterTree. The experiment covers using only 1 random projection with MinCut vs selecting the best projection out of a set of projections. This modification is to address reviewer 2srD request. 3. We provided a runtime analysis covering the acceleration achieved by using ClusterTree over RPTree, to provide with the exact compute overhead tradeoff with the accuracy of the ANN recovery. This includes tables specifying the acceleration values of ClusterTree in query running time vs RP Tree. In appendix F we provide similar acceleration tables comparing Clustertree with PCA tree and KMEANS tree. 4. We moved the previous section 4 to the appendix, and added relevant experiment as per reviewer ZSdM request. 5. Additional experimental results in the appendix at the request of reviewer K1CX, covering 6 additional data sets, overall covering a dozen of different data sets we experimented with. With these final modifications we believe we addressed all of the reviewers’ requests and comments in the updated revision. You can find updated answers to questions and comments in the specific responses to each reviewer.

Supplementary Material: pdf

Assigned Action Editor: ~Arto_Klami1

Submission Number: 3667

Loading