PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

Jun Liu; Beitong Zhou; Weigao Sun; Ruijuan Chen; Claire J. Tomlin; Ye Yuan

PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

Jun Liu, Beitong Zhou, Weigao Sun, Ruijuan Chen, Claire J. Tomlin, Ye Yuan

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: stochastic gradient descent, non-convex optimization, powerball function, acceleration

TL;DR: We propose a new class of optimizers for accelerated non-convex optimization via a nonlinear gradient transformation.

Abstract: In this paper, we propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term \emph{PowerSGD}. The proposed PowerSGD method simply raises the stochastic gradient to a certain power $\gamma\in[0,1]$ during iterations and introduces only one additional parameter, namely, the power exponent $\gamma$ (when $\gamma=1$, PowerSGD reduces to SGD). We further propose PowerSGD with momentum, which we term \emph{PowerSGDM}, and provide convergence rate analysis on both PowerSGD and PowerSGDM methods. Experiments are conducted on popular deep learning models and benchmark datasets. Empirical results show that the proposed PowerSGD and PowerSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. PowerSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization.

Code: https://www.dropbox.com/s/kqfyq4xgelqdge3/PowerSGD_ICLR20_code.zip

Original Pdf: pdf

11 Replies

Loading