Low-Rank Thinning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: New low-rank analysis of thinning algorithms enables applications to fast attention, stochastic gradient reordering, and testing with deep kernels.
Abstract: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.
Lay Summary: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, recently-developed thinning algorithms can match the quality of sampling without replacement while substantially reducing the number of summary points. However, existing guarantees are overly restrictive and pessimistic. To address these deficiencies, we introduce a new analysis of thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical thinning approaches that improve upon the best known guarantees for approximating the quadratic-time computations in neural networks, speeding up model training through example reordering, and rapidly detecting salient differences between datasets.
Link To Code: https://github.com/microsoft/thinformer
Primary Area: Theory->Probabilistic Methods
Keywords: sub-gaussian, thinning, distribution compression, kernel maximum mean discrepancy, low-rank, fast attention, sgd reordering, deep hypothesis testing
Submission Number: 4910
Loading