Keywords: Linear Mode Connectivity, Sparse Neural Networks, Dataset Distillation, Neural Network Pruning, Synthetic Data
TL;DR: We show that distilled data, a synthetic summarization of the real training set, can guide Iterative Magnitude Pruning to find subnetworks stable to SGD noise.
Abstract: With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.
Track: Extended Abstract Track
Submission Number: 23
Loading