Abstract: This paper addresses the problem of building the Euclidean minimum spanning tree (EMST) of a high-dimensional dataset. The EMST problem has a number of applications, including visualization, cosmology, and chemical sciences, to name a few. Recently, due to machine learning-based embedding techniques, many objects are represented as high-dimensional vectors. The above applications, hence, require an efficient EMST algorithm in high dimensions. Borůvka’s algorithm is a representative one, but it incurs \(O(n^2\log n)\) time for n points. The state-of-the-art EMST algorithms improve practical time costs, but they assume low-dimensional points. In high dimensions, such techniques are generally reduced to exhaustive searches, suggesting that computing the exact EMST incurs a high computational cost. Fortunately, many applications require fast response time while allowing approximate results. We therefore propose a new approximate EMST algorithm for high-dimensional points. We conduct experiments on real datasets, and the experimental results demonstrate that our algorithm yields an almost correct answer in a much faster time than that of state-of-the-art algorithms.
External IDs:dblp:conf/pakdd/KidoAH25
Loading