Input Normalized Stochastic Gradient Descent Training for Deep Neural Networks

Salih Furkan Atici; Hongyi Pan; Ahmet Cetin

Input Normalized Stochastic Gradient Descent Training for Deep Neural Networks

Salih Furkan Atici, Hongyi Pan, Ahmet Cetin

Published: 04 Sept 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, choosing optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using a toy neural network and several mature modern deep networks including ResNet-20, ResNet-50, MobileNetV3, WResNet-18, and Vision Transformer. Our INSGD algorithm improves ResNet-20's CIFAR-10 test accuracy from 92.57\% to 92.67\%, MobileNetV3's CIFAR-10 test accuracy from 90.83\% to 91.13\%, WResNet-18 on CIFAR-100 from 78.24\% to 78.47\%, and ResNet-50's accuracy on ImageNet-1K validation dataset from 75.60\% to 75.92\%.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=7u9OfPTHVm&noteId=FkgBUMOU3c

Changes Since Last Submission: Last submission was not rejected but withdrawn due to time-availability. We requested more time to add the changes. In this new submission, 1) We included more experiments to validate the improvement provided by our optimization algorithm. 2) We added ViT model. 3) Theoretical background is improved with additional references. Revisions are made.

Video: https://youtu.be/QzTD0NgC3-A

Code: https://github.com/SalihFurkan/INSGD

Assigned Action Editor: ~Konstantin_Mishchenko1

Submission Number: 2333

Loading