On Learning the Transformer Kernel

Sankalan Pal Chowdhury; Adamos Solomou; Avinava Dubey; MRINMAYA SACHAN

On Learning the Transformer Kernel

Sankalan Pal Chowdhury, Adamos Solomou, Avinava Dubey, MRINMAYA SACHAN

29 Sept 2021 (modified: 04 May 2025)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Transformers, Kernel learning

Abstract: In this work we introduce Kernelised Transformer, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that Kernelized Transformers achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.

One-sentence Summary: We explore how kernel learning can help improve transformers.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/on-learning-the-transformer-kernel/code)

5 Replies

Loading