The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep Graph Neural Network, Graph Neural Network, Network Pruning, The lottery Ticket Hypothesis
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper introduces the Snowflake Hypothesis aimed at addressing the overfitting and oversmoothing issues in training GNNs. By employing graph/network pruning techniques to help each node can have its own independent receptive field.
Abstract: Despite Graph Neural Networks (GNNs) demonstrating considerable promise in graph representation learning tasks, GNNs predominantly face significant issues with over-fitting and over-smoothing as they go deeper as models of computer vision (CV) realm. Given that the potency of numerous CV and language models is attributable to that support reliably training very deep architectures, we conduct a systematic study of deeper GNN research trajectories. Our findings indicate that the current success of deep GNNs primarily stems from (I) the adoption of innovations from CNNs, such as residual/skip connections, or (II) the tailor-made aggregation algorithms like DropEdge. However, these algorithms often lack intrinsic interpretability and indiscriminately treat all nodes within a given layer in a similar manner, thereby failing to capture the nuanced differences among various nodes. In this paper, we introduce \textit{the Snowflake Hypothesis} -- a novel paradigm underpinning the concept of ``one node, one receptive field''. The hypothesis draws inspiration from the unique and individualistic patterns of each snowflake, proposing a corresponding uniqueness in the receptive fields of nodes in the GNNs. We employ the simplest gradient and node-level cosine distance as guiding principles to regulate the aggregation depth for each node, and conduct comprehensive experiments including: (1) different training scheme; (2) various shallow and deep GNN backbones, especially on deep frameworks such as JKNet, ResGCN, PairNorm \textit{etc.} (3) various numbers of layers (8, 16, 32, 64) on multiple benchmarks (six graphs including dense graphs with millions of nodes); (4) compare with different aggregation strategies. The observational results demonstrate that our framework can serve as a universal operator for a range of tasks, and it displays tremendous potential on deep GNNs. It can be applied to various GNN frameworks, enhancing its effectiveness when operating in-depth, and guiding the selection of the optimal network depth in an explainable and generalizable way.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1691
Loading