Low-Rank Thinning

Annabelle Michael Carrell; Albert Gong; Abhishek Shetty; Raaz Dwivedi; Lester Mackey

Low-Rank Thinning

Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: New low-rank analysis of thinning algorithms enables applications to fast attention, stochastic gradient reordering, and testing with deep kernels.

Abstract: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Lay Summary: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, recently-developed thinning algorithms can match the quality of sampling without replacement while substantially reducing the number of summary points. However, existing guarantees are overly restrictive and pessimistic. To address these deficiencies, we introduce a new analysis of thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical thinning approaches that improve upon the best known guarantees for approximating the quadratic-time computations in neural networks, speeding up model training through example reordering, and rapidly detecting salient differences between datasets.

Link To Code: https://github.com/microsoft/thinformer

Primary Area: Theory->Probabilistic Methods

Keywords: sub-gaussian, thinning, distribution compression, kernel maximum mean discrepancy, low-rank, fast attention, sgd reordering, deep hypothesis testing

Submission Number: 4910

Loading