Abstract: Graph processing suffers from severe locality challenges due to considerable inefficient irregular memory accesses, which mainly originate from random accesses to neighbors of active vertices (a.k.a frontiers). Graph reordering, which assigns continuous IDs to vertices that are more likely to be accessed consecutively, can improve access locality effectively and has demonstrated significant speedups across various architectures and systems. Existing graph reordering methods primarily explore the overlapping intensity of in-neighbor vertices for the data access locality characterization. However, many graph algorithms often activate a fraction of the vertices across the graphs, which vary substantially over different inputs and processing iterations. Many of these vertices are neither connected nor have any shared neighbors, but they are actually processed at the same time and exhibit potential data access locality, which is generally overlooked in prior graph reordering methods. We notice that the data locality between concurrently activated vertices are usually attributed to the overlapped $k$-order in-neighbors. As the number of $k$-order in-neighbors grows explosively, it is unacceptably time-consuming to analyze the overlapping of $k$-order in-neighbors for graph reordering directly. In this case, we propose to replace the overlapping calculation of $k$-order in-neighbors with frontier distribution analysis of a few BFS samplings. Specifically, we profile the frontiers distributed across iterations of different BFS samplings first and build a feature vector based on the activated iteration order of each vertex in the BFS samplings. On top of the feature vectors, we propose FrontOrder, which has a customized distance metric to characterize the locality between different vertices and leverages $K$-means to cluster vertices with high locality to guide graph reordering. In addition, FrontOrder also takes the load balance into consideration by predicting the runtime computing intensity with the learned clusters of vertices. According to our experiments, FrontOrder delivers an average performance speedup of ${2.33\times}$ and ${1.57\times}$ on Ligra and GPOP, respectively, and consistently outperforms the state-of-the-art graph reordering methods on a set of representative graph algorithms and datasets with moderate preprocessing overhead.
External IDs:dblp:conf/icde/ZhangLLXZZLL25
Loading