Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning

Annabelle Michael Carrell; Albert Gong; Abhishek Shetty; Raaz Dwivedi; Lester Mackey

Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning

Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

Published: 10 Jun 2025, Last Modified: 01 Jul 2025LCFM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: sub-gaussian, thinning, distribution compression, low-rank, fast attention

TL;DR: New low-rank analysis of thinning algorithms enables applications to fast attention.

Abstract: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers.

Submission Number: 6

Loading