Fast Block Attention Computation via Dynamic Algorithm

Yang Cao; Xuyang Guo; Zhao Song

Fast Block Attention Computation via Dynamic Algorithm

Yang Cao, Xuyang Guo, Zhao Song

19 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: attention, block attention, video generation, dynamic algorithm, theory, data structure

Abstract: Recent progress in video modeling has been largely driven by Transformer architectures, which simulate dependency relationships across spatial patches and temporal frames. However, compared to text or image modeling, video modeling involves orders of magnitude more tokens, resulting in an input sequence several orders of magnitude longer than typical NLP or image tasks, and makes the attention mechanism the primary computational bottleneck. The naive method flattens $f$ frames of $n$ tokens each into length $N = nf$, incurring total $O(n^2f^2)$ attention cost. Prior work (e.g., radial/axial variants) attains subquadratic time only when either the spatial or temporal dimension is small. We present a dynamic algorithm that computes block attention in $O(\mathcal{T}_\mathrm{mat} (n,n,n^a) \frac{f}{n^{a}})$ amortized running time, where $a \in [0,1)$.

Primary Area: generative models

Submission Number: 15689

Loading