The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge

Published: 16 Jun 2024, Last Modified: 20 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: loss landscape, linear mode connectivity, stochastic gradient descent, weight distance, neural network permutation symmetry
TL;DR: We quantify the sensitivity of neural networks to a perturbation at a single iteration of training.
Abstract: Neural network training begins with a chaotic phase in which the network is sensitive to small perturbations, such as those caused by stochastic gradient descent (SGD). This sensitivity can cause identically initialized networks to diverge both in parameter space and functional similarity. However, the exact degree to which networks are sensitive to perturbation, and the sensitivity of networks as they transition out of the chaotic phase, is unclear. To address this uncertainty, we apply a controlled perturbation at a single point in training time and measure its effect on otherwise identical training trajectories. We find that both the $L^2$ distance and the loss barrier (increase in loss on the linear path between two networks) for networks trained in this manner increase with perturbation magnitude and how early the perturbation occurs. Finally, we propose a conjecture relating the sensitivity of a network to how easily it is permuted with respect to another network.
Student Paper: Yes
Submission Number: 79
Loading