Quantum Attention: Fast Algorithms for Scalable Computation

19 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attention Computation
Abstract: Large language models (LLMs) have revolutionized both academia and industry by leveraging attention mechanisms to achieve exceptional performance across diverse tasks. However, the quadratic complexity of attention mechanisms with respect to the input context length poses a significant challenge for scaling LLMs. Quantum computing offers computational advantages over classical methods, yet its application to LLMs remains unexplored. In this work, we employ Grover's Search, a fundamental quantum algorithm, to efficiently compute sparse attention matrices, achieving a polynomial speed-up compared to classical approaches. Additionally, the quantum-generated attention matrices exhibit a low-rank structure, which can be leveraged to develop faster training algorithms for LLMs. We provide a comprehensive analysis of the algorithm’s error rates and time complexity, demonstrating its potential to accelerate LLM computations while maintaining accuracy. Our findings indicate that quantum computing offers a promising pathway for optimizing the performance and scalability of large language models.
Primary Area: learning theory
Submission Number: 15673
Loading