Keywords: transfomers, attention, efficiency
TL;DR: efficient attention for transformers
Abstract: The attention mechanism is an important part of transformer architectures. It en-
ables the network to compare samples within a sequence. Before the comparison
is performed, tokens are multiplied by trainable matrices. These matrices can
constitute a significant part of the total number of parameters. Their size creates
problems on systems with limited cache in the compute unit, especially if there
is limited bandwidth between compute unit and memory. In particular, GPUs on
mobile devices suffer from this double bottleneck.
Prior works mitigate this problem for instance by storing low-rank approxima-
tions, quantization or minimizing the amount of data that needs to be transferred.
In this paper, an alternative to the traditional attention mechanism is proposed
which does not require any trainable matrices to perform the attention. The idea
rests upon solving optimization problems, whereby memory is substituted for
compute. It will be shown however, that the computational demand can be re-
duced such that auto-differentiation becomes possible. An experimental evalua-
tion shows that the proposed algorithm performs favorable compared with several
baselines.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12041
Loading