DGTAT: DECOUPLED GRAPH TRIPLE ATTENTION NETWORKS

Xiaotang Wang

DGTAT: DECOUPLED GRAPH TRIPLE ATTENTION NETWORKS

Xiaotang Wang

17 Sept 2023 (modified: 04 Feb 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Graph Transformer, Node Classification, Decoupled Attention, Positional Encoding, Structural Encoding, Virtual Node

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel node classification model based on the MPNN+VN paradigm and a sampling strategy that decouples local and global interactions, separates learnable positional, attribute, and structural encodings, excelling on variety of datasets.

Abstract: The Message Passing Neural Network (MPNN) is a foundational paradigm in graph learning algorithms, demonstrating remarkable efficacy in its early implementations. Recent research has focused on using Transformer on graph data or combining Transformer with MPNNs to address issues like over-squeezing and over-smoothing while capturing long-range dependencies. However, Graph Transformers (GT) often perform poorly on small datasets. More seriously, much position and structure information encoded by GT-based methods is coupled with node attribute information, affecting node attribute encoding while propagating structure and position information, implicitly impacting on expressiveness. In this paper, we analyze the factors influencing the performance of graph learning models. Subsequently, we introduce a novel model, named DECOUPLED GRAPH TRIPLE ATTENTION NETWORKS (DGTAT). Based on the MPNN+VN paradigm and a sampling strategy, DGTAT effectively decouples local and global interactions, separates learnable positional, attribute, and structural encodings, and computes triple attention. This design allows DGTAT to capture long-range dependencies akin to Transformers while preserving the inductive bias of the graph topology. As a result, it exhibits robust performance across graphs of varying sizes, excelling on both large and small datasets. DGTAT achieves state-of-the-art empirical performance across a variety of node classification tasks, and through ablation experiments, we elucidate the importance of each decoupled design factor within the model. Compared to GT-based models, our model offers enhanced interpretability and flexibility.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 878

Loading