Input Normalized Stochastic Gradient Descent Training for Deep Neural Networks

TMLR Paper2333 Authors

05 Mar 2024 (modified: 19 Mar 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When train- ing complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with l1 and l2-based normalizations applied to the learn- ing rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term us- ing the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-20, Vision Transformer, MobileNetV3, WResNet-18, ResNet-50, and a toy neural network. Our INSGD algorithm improves the mean accuracy of ResNet-20 on CIFAR-10 from 92.57% to 92.67%, the accuracy of MobileNetV3 on CIFAR-10 from 90.83% to 91.13%, WResNet-18 on CIFAR-100 from 78.24% to 78.47%, and ResNet-50 on ImageNet-1K from 75.60% to 75.92%.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=7u9OfPTHVm&noteId=FkgBUMOU3c
Changes Since Last Submission: Last submission was not rejected but withdrawn due to time-availability. We requested more time to add the changes. In this new submission, 1) We included more experiments to validate the improvement provided by our optimization algorithm. 2) We added ViT model. 3) Theoretical background is improved with additional references.
Assigned Action Editor: ~Konstantin_Mishchenko1
Submission Number: 2333
Loading