A Cluster-Based Approach to kNN Join Over Batch-Dynamic High-Dimensional Data

Nimish Ukey; Guangjian Zhang; Zhengyi Yang; Xiaoyang Wang; Binghao Li; Serkan Saydam; Wenjie Zhang

A Cluster-Based Approach to kNN Join Over Batch-Dynamic High-Dimensional Data

Nimish Ukey, Guangjian Zhang, Zhengyi Yang, Xiaoyang Wang, Binghao Li, Serkan Saydam, Wenjie Zhang

Published: 01 Jan 2024, Last Modified: 20 May 2025ADMA (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The k nearest neighbors (kNN) join is a crucial operation in data mining, retrieving the kNN for each point in the query set within an answer set. This operation finds extensive applications in various domains, including recommendation systems, spatial databases, and knowledge discovery. With the surge in data volume and dimensionality, numerous approaches have emerged to enhance the efficiency of kNN join operations on static and dynamic high-dimensional data. However, we observed that research on batch-dynamic kNN join, where updates occur in batches rather than individually, remains scarce. To bridge this gap, we propose a novel cluster-based approach tailored for batch-dynamic kNN join over high-dimensional data. Our contributions include a cluster-based batch update technique, which efficiently processes similar updates in clusters, and a cluster-based pruning method using the high-dimensional R-tree (HDR-Tree) for optimised search during updates. Extensive experimental evaluations across 6 real-world datasets demonstrate the efficiency of our approach, significantly outperforming state-of-the-art methods by 19 to 55 times.

Loading