Admeta: A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers with Bidirectional Looking

Yineng Chen; Zuchao Li; Lefei Zhang; Bo Du; hai zhao

Admeta: A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers with Bidirectional Looking

Yineng Chen, Zuchao Li, Lefei Zhang, Bo Du, hai zhao

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: optimizer, double exponential moving average, bidirectional looking, Adam, SGD

TL;DR: We propose a bidirectional-looking framework, Admeta, in which a novel double exponential moving average mechanism is proposed to adaptive and non-adaptive momentum optimizers.

Abstract: Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel \textsc{Admeta} (\textbf{A} \textbf{D}ouble exponential \textbf{M}oving averag\textbf{E} \textbf{T}o \textbf{A}daptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaching a set value, maintaining its speed at early stage and high convergence performance at final stage. Based on this idea, we provide two optimizer implementations, \textsc{AdmetaR} and \textsc{AdmetaS}, the former based on RAdam and the latter based on SGDM. Through extensive experiments on diverse tasks, we find that the proposed \textsc{Admeta} optimizer outperforms our base optimizers and shows advantages over recently proposed competitive optimizers. We also provide theoretical proof of these two algorithms, which verifies the convergence of our proposed \textsc{Admeta}.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Supplementary Material: zip

Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)

13 Replies

Loading