Keywords: Optimization, Adaptive Methods, Convergence, Convolutional Neural Network
TL;DR: This work proposes a novel generic framework, in which we explicitly analyze different behaviors brought by various types of Φ(·), and based on the framework we propose a more adaptive optimization algorithm.
Abstract: Although adaptive algorithms have achieved significant success in training deep neural networks with faster training speed, they tend to have poor generalization performance compared to SGD with Momentum(SGDM). One of the state-of-the-art algorithms, PADAM, is proposed to close the generalization gap of adaptive methods while lacking an internal explanation. This work pro- poses a general framework, in which we use an explicit function Φ(·) as an adjustment to the actual step size, and present a more adaptive specific form AdaPlus(Ada+). Based on this framework, we analyze various behaviors brought by different types of Φ(·), such as a constant function in SGDM, a linear function in Adam, a concave function in Padam and a concave function with offset term in AdaPlus. Empirically, we conduct experiments on classic benchmarks both in CNN and RNN architectures and achieve better performance(even than SGDM).
Code: https://anonfiles.com/daV7Ed6enb/AdaPlus_zip
Original Pdf: pdf
4 Replies
Loading