- Abstract: We propose a class of very simple modifications of gradient descent and stochastic gradient descent. We show that when applied to a large variety of machine learning problems, ranging from softmax regression to deep neural nets, the proposed surrogates can dramatically reduce the variance and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix coming from the discrete Laplacian or its high order generalizations. The theory of Hamilton-Jacobi partial differential equations demonstrates that the implicit version of new algorithm is almost the same as doing gradient descent on a new function which (i) has the same global minima as the original function and (ii) is ``more convex". We show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev $H_\sigma^p$ sense and reduce the optimality gap for convex optimization problems. We implement our algorithm into both PyTorch and Tensorflow platforms which only involves changing of a few lines of code. The code will be available on Github.
- Keywords: Laplacian Smoothing, Nonconvex Optimization, Deep Learning
- TL;DR: We proposal a simple surrogate for gradient descent to improve training of deep neural nets and other optimization problems.