Universality Patterns in the Training of Neural Networks

Raghav Somani; Navin Goyal; Prateek Jain; Praneeth Netrapalli

Universality Patterns in the Training of Neural Networks

Raghav Somani, Navin Goyal, Prateek Jain, Praneeth Netrapalli

17 May 2019 (modified: 05 May 2023)Submitted to ICML Deep Phenomena 2019Readers: Everyone

Keywords: Universality, surrogate loss, neural networks, deep learning

TL;DR: We identify some universal patterns (i.e., holding across architectures) in the behavior of different surrogate losses (CE, MSE, 0-1 loss) while training neural networks and present supporting empirical evidence.

Abstract: This paper proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, 0/1 error etc.) evaluated for a model arising at (any point of) a training run. This pattern is universal in the sense that this one to one relationship is identical across architectures (such as VGG, Resnet, Densenet etc.), algorithms (SGD and SGD with momentum) and training loss functions (cross entropy and mean squared error).

1 Reply

Loading