Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers
Abstract: We introduce a novel self-attention mechanism, which we call CSA (Chromatic Self-Attention), which extends the notion of attention scores to attention _filters_, independently modulating the feature channels. We showcase CSA in a fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which integrates both graph structural information and edge features, completely bypassing the need for local message-passing components. Our method flexibly encodes graph structure through node-node interactions, by enriching the original edge features with a relative positional encoding scheme. We propose a new scheme based on random walks that encodes both structural and positional information, and show how to incorporate higher-order topological information, such as rings in molecular graphs. Our approach achieves state-of-the-art results on the ZINC benchmark dataset, while providing a flexible framework for encoding graph structure and incorporating higher-order topology.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=FiI30IsMUH
Changes Since Last Submission: Camera-ready version of the manuscript, adding author names, acknowledgements and the link to the GitHub repo. Re-running the experiments from the latest code, we updated a couple of numbers in the ablation study comparing our two methods SPDE and RWSE. The conclusions are unchanged.
Supplementary Material: zip
Assigned Action Editor: ~Srinadh_Bhojanapalli1
Submission Number: 1141