Activation Function Matters in Graph Transformers

18 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: graph neural network, graph transformer, expressive power
Abstract: Following the success of Transformers in deep learning, Graph Transformers have emerged as one of the most prominent architectures for graph representation learning. At the heart of Graph Transformers lies the self-attention mechanism, which aims to elevate information from nodes similar to the query node while suppressing information from others. However, this paper has unveiled a critical limitation: the attention mechanism in MPNN-based Graph Transformers cannot effectively discern the number of neighbors, resulting in a restricted expressive capacity. To address this limitation, we investigate three activation functions for the attention mechanism: $softmax$, $tanh$, and $sigmoid$, and show that $sigmoid$ is the most powerful. Our study culminate in the development of an enhanced variant of the Graph Transformer, known as the Expressive Graph Transformer (EGT), and we prove that EGT effectively distinguish number of neighbors without restricted expressive capacity. Extensive evaluations on graph classification and node classification demonstrate the effectiveness and robustness of EGT. Our code is released at https://anonymous.4open.science/r/EGT-98CA/.
Supplementary Material: zip
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1226
Loading