Keywords: Graph Transformer, Sparse Attention, Graph Representation Learning, Efficient Attention Mechanisms, Efficient Transformers
TL;DR: We propose NUTAG, a scalable sparse Graph Transformer using feature-based virtual nodes, enabling efficient attention with theoretical guarantees and effective learning on homophilic, heterophilic, and large-scale graphs.
Abstract: Graph Transformers (\textsc{GT}) have demonstrated their superiority in graph classification tasks, but their performance in node classification settings remains below par. They are designed for either homophilic or heterophilic graphs and show poor scalability to million-sized graphs. In this paper, we address these limitations for node classification tasks by designing a model that utilizes a special feature encoding that transforms the input graph separating nodes and features, which enables the flow of information not only from the local neighborhood of a node but also from distant nodes, via their connections through shared feature nodes. We theoretically demonstrate that this design allows each node to exchange information with all nodes in the graph, effectively mimicking all-node-pair message passing while avoiding $\mathcal{O}(N^2)$ computation. We further analyze the universal approximation ability of the proposed transformer. Finally, we demonstrate the effectiveness of the proposed method on diverse sets of large-scale graphs, including the homophilic \& the heterophilic varieties.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 13641
Loading