Track: tiny paper (up to 4 pages)
Keywords: Neural scaling law, model compression, lottery ticket hypothesis, deep learning theory
TL;DR: We prove that permutation symmetry enables polylogarithmic compression of neural networks and datasets, thus establishing the dynamical lottery ticket hypothesis and boosting neural scaling laws
Abstract: When training large-scale models, the performance typically scales with parameter count and dataset size via a slow power law. We show that comparable performance can be achieved with far smaller models and much less data. We prove that a generic permutation-invariant function of $d$ objects can be compressed into a function of $\operatorname{polylog} d$ objects with vanishing error, and that this rate is optimal. This yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the model’s loss landscape unchanged. (Ia) establishes the \textit{dynamical} lottery ticket hypothesis: ordinary networks can be strongly compressed without changing learning dynamics or outcomes. (Ib) shows that a scaling law $L\sim d^{-\alpha}$ can be accelerated to arbitrarily fast power-law decay, and ultimately to $\exp\left(-\alpha' \sqrt[m]{d}\right)$.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 14
Loading