Efficient Single-Source SimRank Query by Path AggregationOpen Website

Published: 2023, Last Modified: 13 Feb 2024KDD 2023Readers: Everyone
Abstract: Single-source SimRank query calculates the similarity between a query node and every node in a graph, which traverses the paths starting from the query node for similarity computation. However, the scale of the paths increases exponentially as path length increases, which decreases the computation efficiency. Sampling-based algorithms reduce computational cost by path sampling, but they need to sample sufficient paths to ensure the accuracy, and the performance might be affected by the large scale of paths. In this paper, we propose VecSim for efficient single-source SimRank query by path aggregation. VecSim first aggregates the paths starting from query node with common arrived nodes step by step to obtain the hitting probabilities, and then aggregates the paths starting from the arrived nodes reversely to obtain the first-meeting probabilities in a similar way, in which only several vectors are maintained. The extra-meeting probabilities are excluded from each step, and an efficient sampling-based algorithm is designed, which estimates the extra-meeting probabilities by sampling paths within a specified length. For further speeding up query processing, we propose a threshold-sieved algorithm, which prunes the entries with small values that contribute little to the final similarity scores by setting a threshold. Extensive experiments are done on four small and four large graphs, which demonstrate that VecSim outperforms the competitors in terms of time and space costs on a comparable accuracy. In particular, VecSim achieves an empirical error of 10-4 level in under 0.1 second over all of these graphs.
0 Replies

Loading