Distribution Shift Resilient GNN via Mixture of Aligned Experts

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Graph Neural Networks, Generalization, Distribution Shifts, Mixture-of-experts
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We introduce GraphMETRO, a novel framework based on a mixture-of-experts architecture to improve GNN generalization under complex distribution shifts.
Abstract: The ability of Graph Neural Networks (GNNs) to generalize to diverse and unseen distributions holds paramount importance for real-world applications. However, previous works mostly focus on addressing specific types of distribution shifts, e.g., larger graph size or node degree, which is highly limited when confronted with multiple and nuanced distribution shifts. For example, a node in a social graph may have both increased interactions and features alternation, while its neighbor nodes may see different shifts. Failing to consider such complex distribution shifts will largely hinder the generalization effect in practice. Here we introduce GraphMETRO, a novel framework based on a mixture-of-experts (MoE) architecture, enhancing GNN generalizability for both node- and graph-level tasks. The core concept of GraphMETRO includes the construction of a hierarchical architecture composed of a gating model and multiple expert models that are aligned in a common representation space. Specifically, the gating model identifies the significant mixture components that govern the distribution shift on a node or graph instance. Each aligned expert produces representations invariant to a type of mixture component. Finally, GraphMETRO aggregates the representations from multiple experts to produce an invariant representation w.r.t. the complex distribution shift for the prediction task. Moreover, GraphMETRO provides interpretations on the distribution shift type via the gating model and offers insights into real-world distribution shifts. Through the systematic experiments, we validate the effectiveness of GraphMETRO which outperforms Empirical Risk Minimization (ERM) by 4.6% averagely on synthetic distribution shifts and achieves state-of-the-art performances on four real-world datasets from GOOD benchmark, including a 67% and 4.2% relative improvement over the best previous method on WebKB and Twitch datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1560
Loading