T-FSM: A Scalable Distributed Task-Based System for Frequent Subgraph Pattern Mining from a Big Graph
Abstract: Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel and distributed systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, poor scale-out capability, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. The execution engine of T-FSM supports both intra-machine parallelism and inter-machine parallelism. For intra-machine parallelism, T-FSM adopts a novel task-based execution model to ensure high multithreading concurrency, bounded memory consumption, and effective load balancing. For inter-machine parallelism, T-FSM ensures good scale-out performance with a lightweight pattern rebalancing approach that reduces workload skewness of pattern evaluations among machines. To avoid recomputing the contexts for migrated patterns, we design a novel context cache table to support concurrent and asynchronous requesting and caching of remote context data, which can timely evict and garbage collect used pattern contexts that are no longer needed to keep memory consumption bounded. Extensive experiments show that T-FSM is orders of magnitude faster than existing state-of-the-art parallel systems (more than 10 ×, 51 ×, 131 ×, 55 × speedup over ScaleMine, DistGraph, Pangolin and Peregrine, respectively) and distributed systems (more than 42 × and 88 × over ScaleMine and DistGraph, respectively) for frequent subgraph pattern mining, and it scales out satisfactorily to 512 CPU cores on the Polaris supercomputer at Argonne National Laboratory.
External IDs:doi:10.1145/3771994
Loading