A CLT for Polynomial GNNs on Community-Based Graphs

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Neural Networks, Neighbor Aggregation, Convergence of Measures, Central Limit Theorem, GNN Oversmoothing, Stochastic Block Model
TL;DR: Graph convolutions on node features admit a natural central limit theorem which has consequences to multi-class node classification.
Abstract: We consider the empirical distribution of the embeddings of a $k$-layer polynomial GNN on a semi-supervised node classification task and prove a central limit theorem for them. Assuming a community based model for the underlying graph, with growing average degree $\nu_n\to\infty$, we show that the empirical distribution of the centered features, when scaled by $\nu_{n}^{k-1/2}$ converge in 1-Wasserstein distance to a centered stable mixture of multivariate normal distributions. In addition, the joint empirical distribution of uncentered features and labels when normalized by $\nu_n^k$ approach that of mixture of multivariate normal distributions, with stable means and covariance matrices vanishing as $\nu_n^{-1}$. We explicitly identify the asymptotic means and covariances, showing that the mixture collapses towards a 1-D version as $k$ is increased. Our results provides a precise and nuanced lens on how oversmoothing presents itself in the large graph limit, in the sparse regime. In particular, we show that training with cross-entropy on these embeddings is asymptotically equivalent to training on these nearly collapsed Gaussian mixtures.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 23588
Loading