Estimating Distributions of Large Graphs from Incomplete Sampled Data

Shiju Li, Xin Huang, Chul-Ho Lee

2021 (modified: 12 Nov 2022)Networking 2021Readers: Everyone

Abstract: We study the problem of how to estimate the latent in-degree distribution of large directed graphs from random samples, when the samples only indicate the presence of partial incoming edges into nodes and thus their sampled distribution is far from the original one. While this problem can be cast as an inverse problem, it often appears to be ill-posed and leads to poor estimation performance. There have thus been few recent studies to overcome this problem, which include a constrained, penalized weighted least squares estimator and an asymptotic estimator. The recent estimators, however, are computationally expensive or only limited to estimating the tail distribution, and their performance may not be satisfactory. In this paper, we formulate the problem as a maximum-likelihood estimation problem. We then employ the expectation-maximization algorithm to solve this problem and derive a simple iterative estimator, which is easy to implement and computationally fast. Finally, we empirically demonstrate that our estimator is significantly more accurate than the state-of-the-art estimators and it can also be further improved with a proper choice of its parameter.

0 Replies