Hierarchical Isomerism Distributed Equivalent Union Find for Billion-Scale Disjoint Sets: A Case Study

Liang Chen, Pingchuan Ma, Kai Liu, Liping Yang, Seán McLoone, Yuanjun Miao, Hongbo Liu

Published: 01 Dec 2025, Last Modified: 04 Feb 2026Data Science and EngineeringEveryoneRevisionsCC BY-SA 4.0
Abstract: To monitor clients potentially bypassing position limits, business units employ the disjoint sets principle to identify potential client correlations based on account profiles. The key challenge lies in computing disjoint sets for large-scale topological graphs (with millions of nodes and billions of edges) in a short response time. In this article, we propose a multi-DAG indexing algorithm, namely the Hierarchical Isomerism Distributed Equivalent (HIDE) union find. First, in large-scale topological graphs, we utilize two new topological structures: the equivalent sub-Directed Acyclic Graph (sub-DAG) and the hierarchical isomerism topological graph, to reduce the number of edges and nodes in the multi-DAG indexing merging process. Then, our HIDE union find is proposed to achieve computable splitting across temporal and spatial spans. HIDE union find is theoretically proven to ensure correctness and universality. Experimental validation demonstrates that HIDE union find outperforms previous methods in large-scale topological graphs. The results indicate that HIDE union find achieves response times 100 to 200 times faster than those of the current leading methods.
Loading