Abstract: Optimization algorithms with momentum, e.g., (ADAM) helps accelerate SGD in parameter updating, which can minify the oscillations of parameters update route. However, the fixed momentum weight (e.g., β <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> in ADAM) will propagate errors in momentum computing. Besides, such a hyperparameter can be extremely hard to tune in applications. In this paper, we introduce a novel optimization algorithm, namely Discriminative wEight on Adaptive Momentum (DEAM). DEAM proposes to compute the momentum weight automatically based on the discriminative angle. The momentum term weight will be assigned with an appropriate value which configures the influence of momentum in the current step. In addition, DEAM also contains a novel backtrack term, which restricts redundant updates when the correction of the last step is needed. The backtrack term can effectively adapt the learning rate and achieve the anticipatory update as well. Extensive experiments demonstrate that DEAM can achieve a faster convergence rate than the existing optimization algorithms in training various models. A full version of this paper can be accessed in [1].
0 Replies
Loading