TL;DR: We present the optimal sample complexity for correlation detection in the Gaussian Wigner model.
Abstract: Correlation analysis is a fundamental step in uncovering meaningful insights from complex datasets. In this paper, we study the problem of detecting correlations between two random graphs following the Gaussian Wigner model with unlabeled vertices. Specifically, the task is formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are edge-correlated through a latent vertex permutation, yet maintain the same marginal distributions as under the null. We focus on the scenario where two induced subgraphs, each with a fixed number of vertices, are sampled. We determine the optimal rate for the sample size required for correlation detection, derived through an analysis of the conditional second moment. Additionally, we propose an efficient approximate algorithm that significantly reduces running time.
Lay Summary: We show how to reliably detect whether two random networks are correlated, even when the node correspondence are hidden. Our work identifies the smallest data size needed for this task and introduces an efficient algorithm to perform the test quickly.
Primary Area: General Machine Learning->Sequential, Network, and Time Series Modeling
Keywords: Correlation detection, Gaussian Wigner model, graph sampling, sample complexity, induced subgraphs, efficient algorithm
Submission Number: 6584
Loading