TL;DR: We design learning-augmented algorithms for metric minimum spanning trees based on growing a spanning tree from an initial forest
Abstract: Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of trees using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components in the forest into a spanning tree. We prove that optimally solving step (2) still takes $\Omega(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the heuristic forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.
Lay Summary: Finding a minimum spanning tree for a set of data points is a fundamental computational task that can be used, among other applications, for clustering data into similar groups of points. There are already exact algorithms for solving this problem that rely on computing distances between all pairs of points as a first step. However, this approach is too slow for applications that involve massive datasets, especially when relationships between data points are given by complicated distance functions.
In our work, we design a fast new approach for finding a spanning tree that does not require computing distances between all pairs of data points. The method starts by connecting some subsets of points with a fast machine learning heuristic. It then finds a near optimal way to connect these pieces together to form a full spanning tree. A key contribution of our work is our theoretical analysis: we show that if the starting point computed by the fast machine learning heuristic overlaps well with an optimal minimum spanning tree, then the spanning tree our algorithm produces is provably close to optimal.
Primary Area: General Machine Learning->Scalable Algorithms
Keywords: minimum spanning trees, learning-augmented algorithms, general metric spaces, graph algorithms
Submission Number: 5311
Loading