Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata
Abstract: A current goal in the graph neural network literature is to enable transformers to operate on graph-structured data, given their success on language and vision tasks. Since the transformer's original sinusoidal positional encodings (PEs) are not applicable to graphs, recent work has focused on developing graph PEs, rooted in spectral graph theory or various spatial features of a graph. In this work, we introduce a new graph PE, Graph Automaton PE (GAPE), based on weighted graph-walking automata (a novel extension of graph-walking automata). We compare the performance of GAPE with other PE schemes on both machine translation and graph-structured tasks, and we show that it generalizes several other PEs. An additional contribution of this study is a theoretical and controlled experimental comparison of many recent PEs in graph transformers, independent of the use of edge features.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **15 Feb 2023** Thank you to all reviewers for your feedback and questions. We have made the following changes. | Change | Section | | ----------- | ----------- | | Added experimental results and discussion of performance of PEs on PCQM4Mv2 from the Open Graph Benchmark Large-Scale Challenge | 4.2.3 | | Added clarifying details connecting Equations 1 and 2 | 3.2 | | Added discussion of related work suggested by reviewer Pqa9 | 2, 3.2 | | Added discussion of computing time for LAPE, RW, and GAPE | 4.2.3 | | Added MT performance of LAPE using the path graph | 4.2.3 | | Fixed various typos and notation errors defining the WGWA | 3.1 | | Fixed the connection with Personalized PageRank | 3.3.3 | | Clarified why $\mu$ and $\alpha$ cannot be learned | 3.2 | | Clarified the purpose and usage of the "linear layer" as an implementation detail | 3.2 | | Clarified and reworded the connection between GAPE and Personalized PageRank & RW | 3.3.3 | | Clarified and amended wording describing the relation between this work and Dwivedi et al. (2020) | 4.1 | | Removed section on simulating LAPE | x | We have also made smaller edits such as removing the term "directions", making Sec. 3.3.2 more precise, and amending our description of $\beta$ as the stop probability of a random walk. **Feb 27 2023** Thank you to all reviewers for the follow-up feedback. Among some rewording, we made some minor changes including * Labeled axes in Figures 1 and 2. * Clarified the node order in Figure 2. * Repaired Equation 2 replacing $\mu$ with $\mu^\top$. * Reworded description of PLANAR. * Noted in Section 3.2 how one could in principle use any weighted adjacency matrix in GAPE's definition. * Fixed Remark 3 replacing $\mu = I$ with $\mu = (1-\beta)I$. **Mar 28 2023** Camera-ready version and code uploaded.
Assigned Action Editor: ~Danny_Tarlow1
Submission Number: 692