Fast, Scalable, and Machine-Verified Multicore Disjoint Set Union Data Structures and their Wide Deployment in Parallel Algorithms (Abstract)

Siddhartha Jayanti, Robert E. Tarjan

Published: 2024, Last Modified: 27 Sept 2024HOPC@SPAA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We design a simple, fast, scalable, and reliable concurrent disjoint set union (a.k.a. union-find) data structure. Our algorithms are the first scalable algorithms for the problem. Our best algorithm provides almost-linear speed-up, performing just Θ (m ⋅ (log (np/m + 1) + α (n, m/np))) work when p processes execute a total of m operations on an instance with sets of total size n. We give a rigorous, machine-verified proof of correctness, and we prove that the work-complexity is optimal amongst a class of symmetric algorithms, which include all known concurrent set union algorithms. Our algorithms are fast in practice and have seen wide adoption. They are implemented in Google's recently open-sourced graph-mining library, where they enable "parallel clustering algorithms which scale to graphs with tens of billions of edges" [5]. An MIT research group (Dhulipala, Hong, and Shun) independently implemented hundreds of parallel algorithms for connected components and revealed that our algorithms are consistently the fastest both on CPUs [2] and GPUs [6]. As an illustration, our algorithms were used to compute the components of the Hyperlink2012 graph of 128 billion edges in just 8.2 seconds on a standard 72 core machine; this is 3.1x faster than the state-of-the-art in any computational setting [2]. Several state-of-the-art algorithms for parallel clustering [5, 15, 16], graph analysis [2, 14], and model checking [1] rely on our data structures.