ASA: Adaptive Subquadratic Attention with Dynamic Complexity Adjustment

Published: 07 Jun 2025, Last Modified: 05 Aug 2025Practical-DL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: hybrid attention; linear attention; softmax
Abstract: This paper introduces {Adaptive Subquadratic Attention (ASA) with Dynamic Complexity Adjustment and Information Retrieval}, a novel attention mechanism that improves computational efficiency while preserving model expressiveness by restructuring the query, key, and value (QKV) representations through linear projections and Softmax operations. ASA reduces attention complexity from $\mathcal{O}(n^2d)$ to $\mathcal{O}(nd^2 + ndm)$, where $m < d$, enabling more scalable sequence modeling by avoiding dense pairwise token interactions. Specifically, Q and K - initially in $\mathbb{R}^{n \times d}$ - are are independently projected into a lower-dimensional space $\mathbb{R}^{n \times m}$ using separate linear transformations, minimizing the cost of downstream operations. Softmax is then applied independently to the compressed Q and K to produce probability-like representations, retaining nonlinear attention behavior while avoiding the quadratic cost of pairwise similarity. Finally, ASA performs a two-stage attention process: the transformed key is first used to summarize the value matrix into a compact representation, which is subsequently weighted by the transformed query to produce the final output - enabling efficient and structured information retrieval. Unlike standard Softmax attention, ASA avoids the quadratic computation of pairwise token interactions and decouples Q/K interactions. Compared to linear attention, ASA preserves richer contextual representations by retaining the nonlinear selectivity of Softmax and maintaining probabilistic weighting. This structure enables both efficient attention computation and selective information retrieval, making ASA a compelling trade-off between speed and expressiveness. Empirical results show that ASA consistently outperforms leading Softmax- and linear-based attention mechanisms—including Transformer++, Mamba, and GLA—across a range of NLP tasks such as machine translation, question answering, and text summarization.
Submission Number: 6
Loading