A universal compression theory for lottery ticket hypothesis and neural scaling laws

Hong-Yi Wang; Di Luo; Isaac L. Chuang; Tomaso Poggio; Liu Ziyin

A universal compression theory for lottery ticket hypothesis and neural scaling laws

Hong-Yi Wang, Di Luo, Isaac L. Chuang, Tomaso Poggio, Liu Ziyin

Published: 02 Mar 2026, Last Modified: 16 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny paper (up to 4 pages)

Keywords: Neural scaling law, model compression, lottery ticket hypothesis, deep learning theory

TL;DR: We prove that permutation symmetry enables polylogarithmic compression of neural networks and datasets, thus establishing the dynamical lottery ticket hypothesis and boosting neural scaling laws

Abstract: When training large-scale models, the performance typically scales with parameter count and dataset size via a slow power law. We show that comparable performance can be achieved with far smaller models and much less data. We prove that a generic permutation-invariant function of $d$ objects can be compressed into a function of $\operatorname{polylog} d$ objects with vanishing error, and that this rate is optimal. This yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the model’s loss landscape unchanged. (Ia) establishes the \textit{dynamical} lottery ticket hypothesis: ordinary networks can be strongly compressed without changing learning dynamics or outcomes. (Ib) shows that a scaling law $L\sim d^{-\alpha}$ can be accelerated to arbitrarily fast power-law decay, and ultimately to $\exp\left(-\alpha' \sqrt[m]{d}\right)$.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 14

Loading