Edge-aware FlexAttention Network for Efficient Graph Generation

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph generation, Diffusion, Graph Transformer
TL;DR: FlexAttention-enabled Graph Transformer for more efficient graph generation
Abstract: Graph generative models commonly use graph-specific Transformer architectures to jointly update node and edge features. However, these architectures do not directly benefit from recent hardware-aware attention implementations such as FlexAttention, limiting their scalability. This adaptation is non-trivial: edge and graph-level features must be injected into pairwise attention scores, graph masks must be respected, and dynamic edge updates remain a major source of quadratic memory usage. We propose a FlexAttention-compatible edge-aware Transformer for graph generation that incorporates structural information through head-wise score modulation inside the fused kernel, while updating edge representations with a lightweight residual mechanism that retains key attention-score dependent edge updates. This architecture achieves similar generation quality as our baseline while reducing peak GPU memory by around 65\% and substantially improving training and sampling throughput. These results suggest the proposed architecture is a practical step toward scalable graph foundation models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 181
Loading