Edge-aware FlexAttention Network for Efficient Graph Generation

Vincent Jung; Alba Carballo-Castro; Yiming QIN; Lonneke van der Plas; Pascal Frossard

Edge-aware FlexAttention Network for Efficient Graph Generation

Vincent Jung, Alba Carballo-Castro, Yiming QIN, Lonneke van der Plas, Pascal Frossard

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Graph generation, Diffusion, Graph Transformer

TL;DR: FlexAttention-enabled Graph Transformer for more efficient graph generation

Abstract: Graph generative models commonly use graph-specific Transformer architectures to jointly update node and edge features. However, these architectures do not directly benefit from recent hardware-aware attention implementations such as FlexAttention, limiting their scalability. This adaptation is non-trivial: edge and graph-level features must be injected into pairwise attention scores, graph masks must be respected, and dynamic edge updates remain a major source of quadratic memory usage. We propose a FlexAttention-compatible edge-aware Transformer for graph generation that incorporates structural information through head-wise score modulation inside the fused kernel, while updating edge representations with a lightweight residual mechanism that retains key attention-score dependent edge updates. This architecture achieves similar generation quality as our baseline while reducing peak GPU memory by around 65\% and substantially improving training and sampling throughput. These results suggest the proposed architecture is a practical step toward scalable graph foundation models.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 181

Loading