Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Track: Graph algorithms and modeling for the Web
Keywords: Graph Representation Learning, Node Classification, Invariant Learning, Distribution Shifts, Heterophily and Homophily
TL;DR: This paper investigates a new type of structure distribution shift on heterophilic graphs and propose a novel invariant learning framework without augmentation to mitigate it.
Abstract: Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better unified HGNN backbones for node classification tasks on heterophilic and homophilic graph benchmarks simultaneously, and their analyses of HGNN performance concerning nodes are only based on the determined data distribution without exploring the effect caused by the difference of structural pattern between training and testing nodes. How to learn invariant node representations on heterophilic graphs to handle this structure difference or distribution shifts remains unexplored. In this paper, we first discuss the limitations of previous graph-based invariant learning methods in addressing the heterophilic graph structure distribution shifts from the perspective of data augmentation. Then, we propose HEI, a framework capable of generating invariant node representations through incorporating heterophily information, node's estimated neighbor pattern, to infer latent environments without augmentation, which are then used for invariant prediction. We provide detailed theoretical guarantees to clarify the reasonability of HEI. Extensive experiments on various benchmarks and backbones can also demonstrate the effectiveness and robustness of our method compared with existing state-of-the-art baselines.
Submission Number: 2534
Loading