Training Data Size Induced Double Descent For Denoising Neural Networks and the Role of Training Noise Level
Keywords: Double Descent, Denoising Neural Neworks, High Dimensional Statistics.
Abstract: When training a denoising neural network, we show that more data isn’t more beneficial. In fact the generalization error versus number of of training data points is a double descent curve.
Training a network to denoise noisy inputs is the most widely used technique for pre-training deep neural networks. Hence one important question is the effect of scaling the number of training data points. We formalize the question of how many data points should be used by looking at the generalization error for denoising noisy test data. Prior work on computing the generalization error focus on adding noise to target outputs. However, adding noise to the input is more in line with current pre-training practices. In the linear (in the inputs) regime, we provide an asymptotically exact formula for the generalization error for rank 1 data and an approximation for the generalization error for rank r data. We show using our formulas, that the generalization error versus number of data points follows a double descent curve. From this, we derive a formula for the amount of noise that needs to be added to the training data to minimize the denoising error and see that this follows a double descent curve as well.
Supplementary Material: zip
14 Replies
Loading