Wb-MSF: A Large-scale Multi-source Information Diffusion Dataset for Social Information Diffusion Prediction

Abstract: Recently, a large number of social network studies focus on the diffusion of information posted by individual users, which consequently brings in a strong demand for social network datasets. Nevertheless, most of the available datasets have been published for nearly a decade, and their scale is not large enough. Moreover, they ignore the multiple posts originated by different users spontaneously under the same topic, these posts form a kind of multi-source information. This paper presents Wb-MSF, a large-scale dataset that contains multi-source information cascades and user followership. Different from existing datasets used in information diffusion tasks, Wb-MSF is the first multi-source information dataset, and further provides a followership network. Wb-MSF is crawled from a famous social platform Sina-Weibo and contains tens of millions of followership edges and tens of thousands of information cascades formed by millions of users. It can support information diffusion prediction problem. In this paper, our discussions and experiments including carrying out a statistical analysis of the dataset, and examining the difference between single-source and multi-source information and the effect of the followership network are based on this problem.
0 Replies
Loading