NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Valentin Leplat; Daniil Merkulov; Aleksandr Katrutsa; Daniel Bershatsky; Olga Tsymboi; Ivan Oseledets

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

Published: 20 Sept 2024, Last Modified: 20 Sept 2024ICOMP PublicationEveryoneRevisionsBibTeXCC BY 4.0

Keywords: accelerated optimization methods, semi-implicit discretization, stochastic first-order methods

Abstract: The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper, we propose a novel, robust, and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first extensively studied in the case of minimizing a quadratic function. This analysis allows us to come up with an optimal learning rate in terms of the convergence rate while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. Further, we show that NAG-GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, Transformers in the frame of the GLUE benchmark and the recent Vision Transformers.

Submission Number: 45

Loading