HLA: Hadamard Linear Attention

Hanno Ackermann; Hong Cai; Mohsen Ghafoorian; Amir Habibian

HLA: Hadamard Linear Attention

Hanno Ackermann, Hong Cai, Mohsen Ghafoorian, Amir Habibian

Published: 27 Apr 2026, Last Modified: 27 Apr 2026EDGE PosterEveryoneRevisionsCC BY 4.0

Keywords: linear attention, efficient attention, efficiency

TL;DR: novel efficient attention similar to linear attention but more powerful

Abstract: The attention mechanism is an important reason for the success of transformers. To reduce the high computational cost of standard quadratic attention, linear attention has been proposed. It applies kernel functions to the inputs before the pairwise similarities are calculated. Although that allows for an efficient computational procedure, it reduces the expressive power of linear attention, leading to worse results than softmax-based attention. We propose Hadamard Linear Attention (HLA). In contrast to others works on linear attention, the nonlinearity in HLA is applied after the pairwise similarities have been computed, analogously to standard softmax attention. An efficient computation scheme for the proposed method is derived that is similar to that of standard linear attention. The effectiveness of the approach is demonstrated by applying it to a large diffusion transformer model for video generation, an application that involves very large amounts of tokens.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 4

Loading