Optimizing Attention

Hanno Ackermann; Hong Cai; Markus Nagel; Leyla Mirvakhabova; Farhad G. Zanjani; Fatih Porikli

Optimizing Attention

Hanno Ackermann, Hong Cai, Markus Nagel, Leyla Mirvakhabova, Farhad G. Zanjani, Fatih Porikli

27 Sept 2024 (modified: 22 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: transfomers, attention, efficiency

TL;DR: efficient attention for transformers

Abstract: The attention mechanism is an important part of transformer architectures. It en- ables the network to compare samples within a sequence. Before the comparison is performed, tokens are multiplied by trainable matrices. These matrices can constitute a significant part of the total number of parameters. Their size creates problems on systems with limited cache in the compute unit, especially if there is limited bandwidth between compute unit and memory. In particular, GPUs on mobile devices suffer from this double bottleneck. Prior works mitigate this problem for instance by storing low-rank approxima- tions, quantization or minimizing the amount of data that needs to be transferred. In this paper, an alternative to the traditional attention mechanism is proposed which does not require any trainable matrices to perform the attention. The idea rests upon solving optimization problems, whereby memory is substituted for compute. It will be shown however, that the computational demand can be re- duced such that auto-differentiation becomes possible. An experimental evalua- tion shows that the proposed algorithm performs favorable compared with several baselines.

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12041

Loading