Node Classification in the Heterophilic Regime via Diffusion-Jump GNNs

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: learning on graphs and other geometries & topologies
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: graph neural networks, heterophily, homophily, node classification, diffusion, Dirichlet problem, high-order graph neural networks, structural filters
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Dealing with node heterophily by learning structural filters using learnable diffusion distances.
Abstract: In the heterophilic regime (HR), vanilla GNNs learn latent spaces where nodes with different labels may have similar embeddings. As a result, the performance of node classification degrades significantly in this context. However, existing metrics for heterophily count local discontinuities instead of characterizing heterophily in a structural way. In the ideal (homophilic) regime, nodes belonging to the same community have the same label: most of the nodes are harmonic (their unknown labels result from averaging those of their neighbors given some labeled nodes). Harmonic solvers are natural minimizers of the Laplacian Dirichlet energy. Therefore, a homophilic network is more harmonic than any heterophilic version of the same network. In other words, heterophily can be seen as a “loss of harmonicity”. In this paper, we define “structural heterophily” in terms of the ratio between the harmonicity of the network (Laplacian Dirichlet energy) and the harmonicity of its homophilic version (the so-called “ground” energy). In this paper, we also propose a novel GNN model (Diffusion-Jump GNN) that bypasses structural heterophily by “jumping” through the network in order to relate distant homologs. However, instead of using hops as standard High-Order (HO) GNNs (MixHop) do, our jumps are rooted in a structural well-known metric: the diffusion distance. Given the diffusion distances matrix (DM), we explore different orders of distances wrt each node (closest node, second closest node, etc.) in parallel. Each parallel exploration defines a “jump” that masks the network: it is a new graph that feeds a vanilla GNN layer. Consequently, different GNNs attend to different slices of the DM. As a result, we allow distant homologs to have similar embeddings in (at least) one of the jumps. In addition, as the final embedding of each node depends on the concatenation of its parallel embeddings, we can capture the explainability of each jump via learnable coefficients. Since computing the DM is the core of this method, our main contribution is that we learn both the diffusion distances and the “coefficients” of the edges associated with each jump, thus defining “learnable structural filters”. In order to learn the DM, we exploit the fact that diffusion distances have a spectral interpretation. Instead of computing the eigenvectors of the Laplacian, we learn orthogonal approximations of the Fiedler vector solving a trace-ratio optimization problem while the prediction loss is minimized. This leads to an interplay between a Dirichlert loss, which captures low-frequency content, and a prediction loss which refines that content leading to empirical eigenfunctions. Finally, our experimental results show that we are very competitive with the SOTA both in homophilic and heterophilic datasets, even in large graphs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3270
Loading