Abstract: Information diffusion prediction, as a fundamental task in social network analysis, aims to identify potential users who are likely to participate in an information diffusion process. Most existing works learn user representations based on the collected social network data and then complete downstream prediction tasks. However, due to data privacy protection and low data quality, these methods are always limited by weak information issues of the social network data. For example, incomplete network structure, sparse labels, and insufficient features severely obstruct user representation learning. To mitigate these issues, we design an effective two-stage method MGCL. In the first stage, an enhanced representation is learned for every user even though the social network is with weak information. A multiplex heterogeneous network is adaptively constructed to enrich social network information. To facilitate user representation learning under sparse labels and insufficient features, we further propose self-supervised training specifically tailored for social networks with weak information. In the second stage, the cascade representations are learned using the multi-head self-attention network for information diffusion prediction. Extensive experiments on four real-world datasets validate that MGCL always outperforms state-of-the-art methods.
Loading