A Survey of Linear Attention: Algorithm, Theory, Application, and Infrastructure

TMLR Paper7404 Authors

08 Feb 2026 (modified: 18 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have proven effective in understanding and generating extremely long contexts. Recently, linear attention mechanisms have garnered significant attention, as they can largely reduce the quadratic computational complexity of traditional attention mechanisms to linear complexity relative to token sequence length, thus balancing effectiveness and efficiency in LLM training and inference. This survey mainly focuses on a broad spectrum of linear attention techniques, including traditional linear attention methods, state space model (SSM) series, and linear recurrent neural networks (RNNs). These methods enable implicit historical information integration via state propagation, and achieve approximately constant memory footprint as well as linear time complexity in sequence modeling tasks. Beyond algorithmic designs and model architectures, we further explore the characteristics, challenges, and successful applications of linear attention from a more comprehensive perspective. We also discuss the essential factors for practical hybrid frameworks, robust and efficient infrastructure, and scenario-specific features of downstream tasks, which jointly contribute to the successful deployment of linear attention mechanisms.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Boyu_Wang3
Submission Number: 7404
Loading