Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention

Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention

ICLR 2026 Conference Submission9282 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: linear attention, efficiency, compiler, kernels

TL;DR: Forge is a domain-specific compiler that generates scalable kernels for diverse linear attention variants, from a simple unified abstraction, providing comparable or even better performance than expert-tuned libraries.

Abstract: The quadratic complexity of softmax attention poses a major bottleneck for long-context modeling, motivating a surge of linear attention variants with linear complexity. Unlike softmax attention, which benefits from optimized kernels, linear attention lacks general-purpose, hardware-efficient support and scalable distributed implementations. We introduce Forge, a domain-specific compiler that automates the generation of high-performance, scalable kernels for a wide range of linear attention models directly from high-level PyTorch code. At its core, Forge employs an intuitive programming abstraction that decomposes any linear attention algorithm into three canonical phases: intra-chunk computation, inter-chunk state propagation, and output merging. This unified abstraction enables Forge to perform domain-specific optimizations, automatically generating kernels that fuse computation and communication at a fine-grained tile level and eliminating host synchronization. Our evaluation demonstrates that Forge combines programmability with performance: a wide range of linear attention variants can be implemented in just a few dozen lines of code, while the generated kernels deliver 1.01x-4.9x the performance of sate-of-the-art expert-optimized library and scale with near-linear efficiency on scalar gated linear attention to 16 million tokens on 128 GPUs, surpassing the state-of-the-art distributed baseline by up to 7.2x.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 9282

Loading