Exploring the development of complexity over depth and time in deep neural networks

Published: 16 Jun 2024, Last Modified: 20 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep learning, neural networks, simplicity bias, learning dynamics
Abstract: Neural networks obtain their expressivity from nonlinear activation functions. While it is often assumed that the implemented transformations are effectively nonlinear at every layer, recent studies indicate that the overall function implemented by the network is close to linear at the start of training, only becoming more complex as training progresses. It is unclear how the evolution of the overall function during training can be related to changes in the effective (non)linearity of the individual network layers. In this study, we investigate these changes in effective (non)linearity over time and depth, i.e., over updates during training and per layer. We present a straightforward way to asses the effective linearity of layers through the use of partly linear models; in the case of an 18-layer nonlinear convolutional neural network (CNN) trained on the Imagenet dataset, we find that a large part of the layers start out in the effective linear regime, and that layers become effectively nonlinear in the direction from deep to shallow layers. The evolution over depth and time thus follows a distinct, wave-like pattern. We also propose an alternative method to reveal this evolution in a computationally efficient way, and we extend our experimental results to the Resnet50 architecture. The simple techniques we propose could help gain valuable insights in the relationship between depth, training time, and the complexity of the function a neural network implements.
Student Paper: No
Submission Number: 70
Loading