Keywords: Neural nework, neural networks, deep, TP-agreement, learning order, spectral bias, simplicity bias, easy examples, hard examples, principal components, machine learning, ML, linear networks, over parametrized, over-parametrized
TL;DR: The eigen decomposition of data provably determines the dynamics of deep linear networks. Empirically it is shown to govern the dynamics of non-linear networks early on, and stream-line the order by which examples are learned.
Abstract: Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our asymptotic analysis, assuming that the hidden layers are wide enough, reveals that the convergence rate of this model's parameters is exponentially faster along directions corresponding to the larger principal components of the data, at a rate governed by the singular values. We term this convergence pattern the Principal Components bias (PC-bias). We show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning. We then compare our results to the spectral bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias may explain some benefits of early stopping and its connection to PCA, and why deep networks converge more slowly when given random labels.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
24 Replies
Loading