- Abstract: We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) --- DNNs often fit target functions from low to high frequencies --- on high-dimensional benchmark datasets, such as MNIST/CIFAR10, and deep networks, such as VGG16. This F-Principle of DNNs is opposite to the learning behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibits faster convergence for higher frequencies, for various scientific computing problems. With a naive theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.
- Keywords: deep learning, training behavior, Fourier analysis, generalization
- TL;DR: In real problems, we found that DNNs often fit target functions from low to high frequencies during the training process.