Abstract: In literature, many algorithms are proposed to find strongly connected components (SCC) for directed graphs. Specifically, a SCC of a directed graph $G$ is one of its maximal subgraphs, in which any two nodes are reachable to each other. Existing in-memory algorithms are efficient, and can find all the SCCs of $G$ in a linear time, with respect to the size of $G$. Nevertheless, as the sizes of graphs grow rapidly in real applications, current efforts have been focused on semi-external algorithms. Existing semi-external algorithms maintain an in-memory sketch $\mathcal {A}$ of $G$, and gradually restructure $\mathcal {A}$ with their in-memory processes (IMP) until all the SCCs can be computed based on $\mathcal {A}$. However, the I/O and CPU costs of existing algorithms are still high when $G$ is relatively large. Thus, this paper proposes a new semi-external algorithm EP-SCC with a novel IMP EP-Reduction for finding all the SCCs of $G$ efficiently. Extensive experiments are conducted on both synthetic and real graphs, in which WDC-2014 contains 1.7 billion nodes, and eu-2015 has over 91 billion edges. Experimental results confirm that EP-SCC significantly outperforms existing semi-external SCC algorithms.
External IDs:doi:10.1109/tkde.2021.3138994
Loading