PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex OptimizationDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: stochastic gradient descent, non-convex optimization, powerball function, acceleration
  • TL;DR: We propose a new class of optimizers for accelerated non-convex optimization via a nonlinear gradient transformation.
  • Abstract: In this paper, we propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term \emph{PowerSGD}. The proposed PowerSGD method simply raises the stochastic gradient to a certain power $\gamma\in[0,1]$ during iterations and introduces only one additional parameter, namely, the power exponent $\gamma$ (when $\gamma=1$, PowerSGD reduces to SGD). We further propose PowerSGD with momentum, which we term \emph{PowerSGDM}, and provide convergence rate analysis on both PowerSGD and PowerSGDM methods. Experiments are conducted on popular deep learning models and benchmark datasets. Empirical results show that the proposed PowerSGD and PowerSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. PowerSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization.
  • Code: https://www.dropbox.com/s/kqfyq4xgelqdge3/PowerSGD_ICLR20_code.zip
11 Replies