Noise Injection Irons Out Local Minima and Saddle Points

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX
Keywords: smoothing, nonconvex, two-layer NN
TL;DR: Noise injection can help neural network training - but not always.
Abstract: Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. It has been observed in practice that injecting artificial noise into stochastic gradient descent (SGD) can sometimes improve training and generalization performance. In this work, we formalize noise injection as a smoothing operator and (review and derive) convergence guarantees of SGD under smoothing. We empirically found that Gaussian smoothing works really well for training two-layer neural networks, but these findings do not translate to deeper nets. We would like to use this contribution to stimulate a discussion in the community to further investigate the impact of noise in training machine learning models.
Submission Number: 104