Training Data Size Induced Double Descent For Denoising Feedforward Neural Networks and the Role of Training Noise
Abstract: When training an unregularized denoising feedforward neural network, we show that the generalization error versus the number of training data points is a double descent curve. We formalize the question of how many training data points should be used by looking at the generalization error for denoising noisy test data. Prior work on computing the generalization error focuses on adding noise to target outputs. However, adding noise to the input is more in line with current pre-training practices. In the linear (in the inputs) regime, we provide an asymptotically exact formula for the generalization error for rank 1 data and an approximation for the generalization error for rank $r$ data. From this, we derive a formula for the amount of noise that needs to be added to the training data to minimize the denoising error. This results in the emergence of a shrinkage phenomenon for improving the performance of denoising DNNs by making the training SNR smaller than the test SNR. Further, we see that the amount of shrinkage (ratio of the train to test SNR) also follows a double descent curve.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Michael_U._Gutmann1
Submission Number: 781