UnifiedGT: Exploring the Effective Ingredients of Transformers in Large Graphs

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: graph transformer, graph learning, graph neural network, large-scale graph, heterogeneous graph, homogeneous graph
Abstract: In recent years, transformer models have demonstrated great potential for modeling graph-structured data, and many graph transformers (GT) have been proposed and applied to graph representation learning tasks. However, while GTs are effective, existing GTs have mostly been applied to small graphs, and their critical ingredients for success and the connections among these components when processing large graphs are poorly understood. Through a systematic investigation of using GTs on large graphs, we find that (i) explicit graph structure injection through direct neighbor attention masking is significantly more effective than implicitly using graph structure through positional encoding; (ii) combining a direct neighbor-attended GT with a message-passing graph neural network (MP-GNN) boosts accuracy; and (iii) the FFN acts as a semantic mixer and plays an important role, even though some existing GTs neglect the FFN. As part of our systematic investigation, we break down the design space of state-of-the-art GTs and introduce a modular unified GT framework, called UnifiedGT, which is effective at handling both large-scale heterogeneous and homogeneous graph data. UnifiedGT consists of five major components: (i) graph sampling, (ii) structural prior injection, (iii) attention calculation, (iv) composition of local message-passing and long-range attention, and (v) fully-connected layer. UnifiedGT provides different options for each component, which enables practitioners to create new GT methods that significantly improve accuracy over existing methods. Based on comprehensive experiments using UnifiedGT on the Open Academic Graph, we identify our best-performing method, ParDNTrans (GT with a parallel connected MP-GNN and direct neighbor attention masking), which boosts accuracy by 4.5–5.3% over the state-of-the-art graph transformer.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8224
Loading