Real-time detection and trend tracing of burst topics based on Negative Binomial Distribution on spark

Abstract: Social networks have evolved into a popular information and communication platform, and the vast amount of data it generates are rapidly changing and spreading. Thus, it is essential to detect and trace large events and burst topics in mass social network data based on real-time Big Data parallel computing. In this paper, we propose a model that uses the Negative Binomial Distribution to fit the distribution of Weibo topic words. Then, we introduce the concepts of the ‘hot degree’ and the ‘dispersion degree’ of a topic with their corresponding computing methods. And we validate the efficiency of the model using real data. Furthermore, we design a topic detection and trend-tracing algorithm based on stream data, and implement the algorithm on Spark Streaming which is a streaming processing framework that uses memory computing. Finally, the experiments on real data demonstrate that our proposal is effective and efficient in tracking bursting events.
0 Replies
Loading