Geodesic Distributions Reveal How Heterophily and Bottlenecks Limit the Expressive Power of Message Passing Neural Networks

Published: 18 Nov 2023, Last Modified: 25 Nov 2023LoG 2023 PosterEveryoneRevisionsBibTeX
Keywords: message passing neural networks, expressive power, statistical graph ensembles, graph geodesic length distribution, graph bottlenecks, heterophily
TL;DR: We provide a statistical framework to understand the relationship of MPNN performance on node classification tasks and graph structural properties, like heterophily and bottlenecks, by using statistical graph ensembles and geodesic distributions.
Abstract: Whilst having shown great success in graph representation learning, message passing neural networks (MPNNs) are known to encounter difficulties in node classification tasks when learning expressive feature representations on certain unfavourable graph structures, especially heterophilic and bottlenecked graphs that have previously been the subject of extensive, but separate, studies. In this paper we develop a theoretical framework to understand the combined effect of heterophily and bottlenecking on the expressive power of MPNNs. We provide a statistical perspective on the performance of the MPNN that decomposes into its expressive power—as measured by "signal sensitivity'' that encodes its maximal sensitivity to changes in the mean input features of each node class and ought to be maximised—and generalisation power—as measured by its "noise sensitivity'' that ought to be minimised. We then relate signal responsiveness to the graph structure through $\ell$-order homophily, a quantity that captures both homophily and bottlenecking behaviour of graphs in a phenomenon we refer to as "homophilic bottlenecking''. Pushing the statistical view further by assuming a distribution over graph structures yields a natural decoupling of bottlenecking into two terms measuring underreaching and oversquashing respectively in an $\ell$-layer MPNN which makes use of the distribution of geodesic distances up to length $\ell$ in the graph. Using an asymptotic distribution of geodesic distances in a very general random graph family we can derive tight bounds on $\ell$-order homophily, thus providing a complete analytic characterisation of homophilic bottlenecking in MPNNs. Notably, we show that our statistic accurately tracks empirical node classification performance. Our findings offer an interpretable statistical approach for understanding MPNN performance across a variety of graph families, and suggest potentially promising ways to design more powerful MPNNs.
Submission Type: Extended abstract (max 4 main pages).
Agreement: Check this if you are okay with being contacted to participate in an anonymous survey.
Poster: jpg
Poster Preview: png
Submission Number: 130
Loading