The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

ACL ARR 2025 July Submission1248 Authors

29 Jul 2025 (modified: 31 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Transformers have dominated sequence processing tasks for the past seven years---most notably language modeling. However, the inherent quadratic complexity of their attention mechanism remains a significant bottleneck as context length increases. This paper surveys recent efforts to overcome this bottleneck, including advances in (sub-quadratic) attention variants, recurrent neural networks, state space models, and hybrid architectures. We critically analyze these approaches in terms of compute and memory complexity, benchmark results, and fundamental limitations to assess whether the dominance of pure-attention transformers may soon be challenged.
Paper Type: Long
Research Area: Generation
Research Area Keywords: efficient models, model architectures
Contribution Types: Surveys
Languages Studied: English
Submission Number: 1248
Loading