Generalizable Insights for Graph Transformers in Theory and Practice

Timo Stoll; Luis Müller; Christopher Morris

Generalizable Insights for Graph Transformers in Theory and Practice

Timo Stoll, Luis Müller, Christopher Morris

Published: 23 Sept 2025, Last Modified: 27 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: graph transformers, expressivity, structural encodings, structural embeddings, positional encodings

TL;DR: We provide a principled, unified graph transformer architecture, study the impact of structural embeddings for expressivity, and do a large-scale experimental study to complement the theory.

Abstract: Graph transformers (GTs) have demonstrated strong empirical performance; however, current architectures exhibit significant variations in their utilization of attention mechanisms, positional embeddings (PEs), and expressivity. Existing expressivity results are often tied to specific design choices and lack comprehensive empirical validation on large-scale data. This leaves a gap between theory and practice, preventing the generation of generalizable insights that extend beyond particular application domains. Here, we propose the Generalized-Distance Transformer (GDT). This GT architecture incorporates many advancements for GTs from recent years, and develops a fine-grained understanding of the GDT’s representation power in terms of attention and PEs. Through extensive experiments, we identify design choices that consistently perform well across various applications, tasks, and model scales, demonstrating strong performance in a few-shot transfer setting without the need for fine-tuning. We distill our theoretical and practical findings into several generalizable insights about effective GT design, training, and inference.

Submission Number: 33

Loading