Multiplicative RMSprop Using Gradient Normalization for Learning Acceleration

Manos Kirtas, Nikolaos Passalis, Anastasios Tefas

Published: 01 Jan 2024, Last Modified: 28 Apr 2025ICPR (29) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Although deep learning (DL) architectures achieve state-of-the-art performance in a wide range of applications, such as computer vision, the training process remains highly sensitive to hyperparameters, initial weights, and data distributions, making the development of fast and stable optimization methods a challenging task. The Root Mean Square propagation (RMSprop) optimization method has successfully extended the Stochastic Gradient Descent (SGD), using an adaptive learning rate mechanism, establishing its use in the DL community. However, even RMSprop suffers from convergence issues related to the high variance of gradients and learning rates at the initial stage of training. Motivated by the significant contribution of the multiplicative updates in the early development of Machine Learning and recent preliminary results, in this work, we propose a multiplicative update term oriented to RMSprop, significantly improving its performance. More specifically, the proposed term employs normalization to gradients and scales the parameters according to their magnitudes, leading to significant acceleration at the initial stage of training, while resulting in more robust models. Based on the proposed update term, we formulate two novel RMSprop alternatives demonstrating the acceleration and robust capabilities on traditionally used image classification benchmarks as well as to convex and non-convex optimization tasks.