The graph alignment problem: fundamental limits and efficient algorithms. (Alignement de graphes : limites fondamentales et algorithmes efficaces)

Abstract: This thesis focuses on statistical inference in graphs (or matrices) in high dimension and studies the graph alignment problem which aims to recover a hidden underlying matching between the nodes of two correlated random graphs. Similarly to many other inference problems in planted models, we are interested in understanding the fundamental information-theoretical limits as well as the computational hardness of graph alignment. First, we study the Gaussian setting, when the graphs are complete and the signal lies on correlated Gaussian edges weights. We prove that the exact recovery task exhibits a sharp information-theoretic threshold, characterize it, and study a simple and natural spectral method for recovery, EIG1, which consists in aligning the leading eigenvectors of the adjacency matrices of the two graphs. While most of the recent work on the subject was dedicated to recovering the hidden signal in dense graphs, we next explore graph alignment in the sparse regime, where the mean degrees are constant, not scaling with the graph size. In this particularly challenging setting, for sparse Erdos-Rényi graphs, only a fraction of the nodes can be correctly matched by any algorithm. Our second contribution is an information-theoretical result which characterizes a regime where even this partial alignment is impossible, and gives upper bounds on the reachable overlap between any estimator and the true planted matching. We next propose an algorithm that performs partial alignment, NTMA, which is based on a measure of similarity – called the tree matching weight – between tree-like neighborhoods of the nodes in the graphs. Under this local approach in the sparse regime, we are brought to study a related problem: correlation detection in random unlabeled trees. This hypothesis testing problem consists in testing whether two trees are correlated or independent. The tree matching weight yields a first method for this question as well; another contribution is to study an optimal test based on the likelihood ratio. In a correlated Galton-Watson model, which is well-known to be the local approximation of sparse Erdos-Rényi graphs, we characterize the regimes of performance of this test. Finally, we come back to graph alignment and propose a message-passing algorithm, MPAlign, naturally inspired by the study of the related problem on trees. This message-passing algorithm is analyzed and provably recovers a fraction of the planted signal in some regimes of parameters.
0 Replies
Loading