Keywords: bayesian nonparametrics, network analysis, stochastic blockmodels, generalized gamma process, completely random measures
TL;DR: We propose a model for sparse graphs where observed connections depend on each node's sociability and mixed community memberships; our formulation allows learning the number of communities from the data and enables efficient Monte Carlo methods.
Abstract: Network models for exchangeable arrays, including most stochastic block models, generate dense graphs with a limited ability to capture many characteristics of real-world social and biological networks. A class of models based on completely random measures like the generalized gamma process (GGP) have recently addressed some of these limitations. We propose a framework for thinning edges from realizations of GGP random graphs that models observed links via nodes' overall propensity to interact, as well as the similarity of node memberships within a large set of latent communities. Our formulation allows us to learn the number of communities from data, and enables efficient Monte Carlo methods that scale linearly with the number of observed edges, and thus (unlike dense block models) sub-quadratically with the number of entities or nodes. We compare to alternative models for both dense and sparse networks, and demonstrate effective recovery of latent community structure for real-world networks with thousands of nodes.
Supplementary Material: pdf