Keywords: Data augmentation, Graph representation learning, GNNs, Homophily, Heterophily
TL;DR: We should pay attention to data enrichment
Abstract: It is widely assumed that standard GNNs perform better on graphs with high homophily, leading to the development of specialised algorithms for heterophilic datasets in recent years. In this work, we both challenge and leverage this assumption. Rather than creating new algorithms, we emphasise the importance of understanding and enriching the data. We introduce a novel data engineering technique, \textit{Spectral Highways}, that enhances the performance of both heterophilic and non-heterophilic GNNs on heterophilic datasets. Our method augments a given heterophilic graph by adding supernodes, thereby creating a network of highways connecting spectral clusters in the graph. It facilitates additional paths to bring similar nodes closer than dissimilar ones by reducing the average shortest path lengths. We draw both intuitive and empirical connections between the relative decreases in intraclass and interclass average shortest path lengths and shifts in the graph's homophily levels, providing a novel perspective that extends beyond traditional homophily measures. We conduct extensive experiments on seven heterophilic datasets using various GNN architectures and also compare with data-centric techniques, demonstrating significant improvements in node classification performance. Furthermore, our empirical findings highlight the strong sensitivity of several recent GNNs to the random seed used for data splitting, underscoring the importance of this often-overlooked factor in GNN evaluation.
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6968
Loading