A space and time efficient algorithm for SimRank computationDownload PDFOpen Website

Published: 2012, Last Modified: 16 May 2023World Wide Web 2012Readers: Everyone
Abstract: SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields $O\left(n^3\right)$ worst-case time per iteration with the space requirement $O\left(n^2\right)$ , where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes $O \left(\min \left\{ n \cdot m , n^r \right\}\right)$ time and $O \left( n + m \right)$ space, where m is the number of edges in a web-graph model and r ≤ log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.
0 Replies

Loading