Abstract: Expert finding is an important technique to obtain the user authority ranking in community question answering (CQA) websites. ZhihuRank is a topic-sensitive expert finding algorithm, which is based on both LDA and PageRank. Currently, with the amount of participants and documents increasing rapidly in CQA websites, how to parallel expert finding algorithms for big data analysis has received significant attention. In this paper, we find that the Spark framework is more suitable for paralleling expert finding algorithms than the MapReduce framework, which is a memory-based parallel computing model to support complicated iterative algorithms. As an example, we parallel ZhihuRank using MLlib's LDA and GraphX's PageRank in Spark. Experiments have been conducted on large-scale real data from Zhihu <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> (the most popular CQA website in China). And the experimental results confirmed the effectiveness and scalability of our proposed approach.
0 Replies
Loading