Hilbert Attention for Image Generation with Diffusion Models

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model; Sparse Attention
TL;DR: HilbertA maps 2D image tokens onto a Hilbert curve for coalesced GPU memory access, adds a layer-wise sliding schedule plus a small central shared region to maintain long-range and cross-tile context.
Abstract: Designing sparse attention for diffusion transformers requires reconciling two-dimensional spatial locality with GPU efficiency, a trade-off that current methods struggle to achieve. Existing approaches enforce two-dimensional spatial locality but often incur uncoalesced memory access. We present HilbertA, a 2D-aware and GPU-efficient sparse attention mechanism. HilbertA reorders image tokens along Hilbert curves to achieve a contiguous memory layout while preserving spatial neighborhoods, and employs a sliding schedule across layers to enable long-range information propagation without repeated or uncoalesced memory access. To further enhance cross-tile communication and positional awareness, HilbertA introduces a small central shared region. Implemented in Triton, HilbertA delivers comparable image quality with significant acceleration over prior methods on Flux.1-dev, demonstrating the feasibility of hardware-aligned two-dimensional sparse attention for high-resolution image generation. HilbertA delivers attention speedups of $2.3\times$ when generating 1024 $\times$ 1024 images, and up to $4.17\times$ at 2048 $\times$ 2048, while achieving image quality comparable to or surpassing baselines.
Primary Area: generative models
Submission Number: 11756
Loading