SoftAdam: Unifying SGD and Adam for better stochastic gradient descent

Abraham J. Fetterman; Christina H. Kim; Joshua Albrecht

SoftAdam: Unifying SGD and Adam for better stochastic gradient descent

Abraham J. Fetterman, Christina H. Kim, Joshua Albrecht

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: An algorithm for unifying SGD and Adam and empirical study of its performance

Abstract: Abstract Stochastic gradient descent (SGD) and Adam are commonly used to optimize deep neural networks, but choosing one usually means making tradeoffs between speed, accuracy and stability. Here we present an intuition for why the tradeoffs exist as well as a method for unifying the two in a continuous way. This makes it possible to control the way models are trained in much greater detail. We show that for default parameters, the new algorithm equals or outperforms SGD and Adam across a range of models for image classification tasks and outperforms SGD for language modeling tasks.

Keywords: Optimization, SGD, Adam, Generalization, Deep Learning

Original Pdf: pdf

9 Replies

Loading