Pure Transformers are Powerful Graph Learners

Jinwoo Kim; Dat Tien Nguyen; Seonwoo Min; Sungjun Cho; Moontae Lee; Honglak Lee; Seunghoon Hong

Pure Transformers are Powerful Graph Learners

Jinwoo Kim, Dat Tien Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, Seunghoon Hong

Published: 31 Oct 2022, Last Modified: 04 Aug 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: graph, transformer, self-attention, graph neural network, graph transformer, equivariant neural network, permutation equivariance, graph positional embedding

TL;DR: We show that standard Transformers without graph-specific modifications can work well in graph learning both in theory and practice.

Abstract: We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/pure-transformers-are-powerful-graph-learners/code)

30 Replies

Loading