Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

NeurIPS 2023 Workshop ATTRIB Submission22 Authors

Published: 27 Oct 2023, Last Modified: 08 Dec 2023ATTRIB OralEveryoneRevisionsBibTeX

Keywords: neural network optimization, progressive sharpening, edge of stability, adaptive gradient methods, batch normalization

TL;DR: We show the influence of samples w/ large, opposing features which dominate a network's output. This demonstrates the relative importance of small subsets of the training data for the model's predictions at various stages of training.

Abstract: We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics and demonstrates how a small number training points can have an unusually large effect on a network's optimization trajectory and predictions. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong \emph{opposing signals}: consistent, large magnitude features which dominate the network output and occur in both groups with similar frequency. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We complement these experiments with a theoretical analysis of a two-layer linear network on a simple model of opposing signals. Our finding enables new qualitative predictions of behavior during and after training which we confirm experimentally. It also provides a new lens through which to study how specific data influence the learned parameters.

Submission Number: 22

Loading