Understanding convolution on graphs via energies

Published: 23 Aug 2023, Last Modified: 23 Aug 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two limitations: poor performance on heterophilic graphs, and over-smoothing. It is common belief that both phenomena occur because such models behave as low-pass filters, meaning that the Dirichlet energy of the features decreases along the layers incurring a smoothing effect that ultimately makes features no longer distinguishable. In this work, we rigorously prove that simple graph-convolutional models can actually enhance high frequencies and even lead to an asymptotic behaviour we refer to as over-sharpening, opposite to over-smoothing. We do so by showing that linear graph convolutions with symmetric weights minimize a multi-particle energy that generalizes the Dirichlet energy; in this setting, the weight matrices induce edge-wise attraction (repulsion) through their positive (negative) eigenvalues, thereby controlling whether the features are being smoothed or sharpened. We also extend the analysis to non-linear GNNs, and demonstrate that some existing time-continuous GNNs are instead always dominated by the low frequencies. Finally, we validate our theoretical findings through ablations and real-world experiments.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: List of updates to the manuscript in light of the reviewers’ comments: - Added a sentence at the beginning of page 2 to highlight that Q.1 and Q.2 are indeed related and clarify how they look at different aspects of the same problem (frequency response the former, limit behaviour the latter). To compensate for space, we have slightly rephrased the second bullet point of the contributions. - At page 7 above the statement of Theorem 4.3, we have added a paragraph “Similarly to..” where we clarify that the exact spectral analysis in the time continuous and time discrete cases is performed for a simplified version of the gradient flow system. We have also stated more visibly the assumption used throughout the rest of Section 4 and Section 5 to help the reader. We note that the results can be extended to versions of the more general gradient flow equations of Proposition 1 quite trivially (e.g. when $\boldsymbol{\Omega}$ and $\mathbf{W}$ commute), however this could deviate too much from our story. In fact, the purpose of our work is showing that there exist simple graph convolutions that can enhance the high frequencies and induce behaviours other than over-smoothing. On the other hand, we highlight that the results in Section 6 for the non-linear layers hold in the generality of Eq. 1. - In line with the previous point, we have modified Eq. (10) at the beginning of Section 5 to be the simplified gradient flow system we consider for Theorem 5.1. - We have added in Proposition 4.1 that the graph is assumed to have at least one non-trivial edge to avoid corner cases. We have also extended the proof of Proposition 4.1 in the Appendix B.1 to have more details. - Added a sentence below Eq. (12) to clarify that if (12) holds, then $\mathbf{W}$ must have negative eigenvalues, i.e. $\mu_0 < 0$. - Added a clarification to the statement of Theorem 6.1 to specify that if there are no positive eigenvalues for the matrix $\boldsymbol{\Omega}\otimes\mathbf{I} - \mathbf{W}\otimes\boldsymbol{\mathsf{A}}$, then we can simply take c to be zero. - Added a derivation of Eq. (6) in Appendix B.1. - We have removed item (i) from page 7 and instead added an equivalent comment at page 8 below eq. 11, starting “In fact we note..”. - Inverted the ratio of the normalized energies at the last row of Table 2 to improve readability. - Added reference to “A critical look at the evaluation of GNNs under heterophily: are we really making progress?” in the last paragraph of page 1. - Added references to “Magnet: A neural network for directed graphs”, “ACMP: Allen-Cahn Message Passing for Graph Neural Networks with Particle Phase Transition”, Revisiting heterophily for graph neural networks", "Improving Graph Neural Networks with Learnable Propagation Operators” in the Related work Section. - We have changed the titles of Subsection 4.1 and 4.2 to “Non-learnable case” and “learnable case”, respectively to avoid any confusion. - Clarified differences between results in Table 1 and those in Table 3 in the appendix via an additional paragraph below eq. (35) - Removed reference to Cornell_old and clarified the instance of the dataset used in experiments
Code: https://github.com/JRowbottomGit/graff
Assigned Action Editor: ~Guillaume_Rabusseau1
Submission Number: 1205