Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs
Abstract: How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? How to handle duplicated edges in local triangle counting for graph stream? Local triangle counting, which computes the number of triangles attached to each node in a graph, is a very important problem with wide applications in social network analysis, anomaly detection, web mining, and the like.In this article, we propose algorithms for local triangle counting in a graph stream based on edge sampling: Mascot for a simple graph, and MultiBMascot and MultiWMascot for a multigraph. To develop Mascot, we first present two naive local triangle counting algorithms in a graph stream, called Mascot-C and Mascot-A. Mascot-C is based on constant edge sampling, and Mascot-A improves its accuracy by utilizing more memory spaces. Mascot achieves both accuracy and memory-efficiency of the two algorithms by unconditional triangle counting for a new edge, regardless of whether it is sampled or not. Extending the idea to a multigraph, we develop two algorithms MultiBMascot and MultiWMascot. MultiBMascot enables local triangle counting on the corresponding simple graph of a streamed multigraph without explicit graph conversion; MultiWMascot considers repeated occurrences of an edge as its weight and counts each triangle as the product of its three edge weights. In contrast to the existing algorithm that requires prior knowledge on the target graph and appropriately set parameters, our proposed algorithms require only one parameter of edge sampling probability.Through extensive experiments, we show that for the same number of edges sampled, Mascot provides the best accuracy compared to the existing algorithm as well as Mascot-C and Mascot-A. We also demonstrate that MultiBMascot on a multigraph is comparable to Mascot-C on the counterpart simple graph, and MultiWMascot becomes more accurate for higher degree nodes. Thanks to Mascot, we also discover interesting anomalous patterns in real graphs, including core-peripheries in the web, a bimodal call pattern in a phone call history, and intensive collaboration in DBLP.
Loading