Keywords: graph transformers, expressivity, structural encodings, structural embeddings, positional encodings
TL;DR: We provide a principled, unified graph transformer architecture, study the impact of structural embeddings for expressivity, and do a large-scale experimental study to complement the theory.
Abstract: Graph transformers (GTs) have demonstrated strong empirical performance; however,
current architectures exhibit significant variations in their utilization of attention
mechanisms, positional embeddings (PEs), and expressivity. Existing expressivity
results are often tied to specific design choices and lack comprehensive empirical
validation on large-scale data. This leaves a gap between theory and practice, preventing
the generation of generalizable insights that extend beyond particular application
domains. Here, we propose the Generalized-Distance Transformer (GDT). This GT
architecture incorporates many advancements for GTs from recent years, and develops a
fine-grained understanding of the GDT’s representation power in terms of attention and
PEs. Through extensive experiments, we identify design choices that consistently
perform well across various applications, tasks, and model scales, demonstrating strong
performance in a few-shot transfer setting without the need for fine-tuning. We distill
our theoretical and practical findings into several generalizable insights about effective
GT design, training, and inference.
Submission Number: 33
Loading